基于自然语言处理(NLP)的生态环境准入清单政策内容分析

Policy texts analysis of list of environmental permit based on natural language processing (NLP)

  • 摘要: 生态环境准入清单是生态环境分区管控制度的核心抓手,通过空间布局约束、污染排放管控、环境风险防控和资源能源利用效率控制等维度实现生态环境源头预防。生态环境准入清单存在政策文本庞大、管控措施多样、表达构成复杂特点,识别准入清单管控的对象、方式与力度是支撑生态环境分区管控政策实施的重要基础。本研究基于自然语言机器无监督学习技术对生态环境准入清单进行政策词汇模式挖掘并对政策文本设定多维定量化标签,应用自然语言深度学习模型对生态环境准入清单管控措施进行文本分类评估。河北省是我国产业门类最齐全、资源环境问题最复杂的省份之一,其生态环境准入管控具有典型性和代表性。以河北省生态环境准入清单的产业管控措施为例,识别了10类政策关键词特征、64项主要政策关键词,对全清单中对应关键词所在的语句覆盖率达95%;构造了24个管控措施-行业的分类标签,应用并比较了BERT、RoBERTa和ALBERT深度学习模型对政策文本的分类识别效果,预测精度、召回率和F1得分最高分别可达到0.95、0.79和0.86,训练模型可较好地识别准入清单政策内容。结果显示河北省准入清单在管控措施明确化、具体化、定量化方面仍存在不足,产业精细化管控、考核指标型以及时限型内容有待补充和细化。本研究提出的方法具有较好的适用前景,建议在此基础上结合前沿人工智能方法,进一步提高模型自动处理效率、动态分析以及提供精细化政策调整建议的能力。

     

    Abstract: The list of environmental permit (LEP) is the core lever of the ecological environment zoning-based regulation (EZR) system, which aims to achieve pollution prevention and control at the source through spatial layout constraints, pollution emission control, environmental risk prevention and control, and resource and energy utilization efficiency control. Quantitatively identifying the objects, control measures, and intensity of LEPs is a crucial step for supporting EZR implementation. However, LEPs face challenges such as extensive policy texts, diverse control measures, and complex expressions. In this study, we utilized unsupervised natural language machine-learning techniques to mine the pattern of control vocabulary in LEPs and multidimensional quantitative labels for text content. Based on this, we employed natural language deep learning models to classify and evaluate the policy content of LEPs. Hebei Province is one of the provinces with the most complete industrial categories and the most complex resource and environmental issues in China, with typical and representative characteristics in ecological environmental regulation. Taking the industrial control measures of LEPs in Hebei Province as an example, we identified 10 categories of policy keyword features and 64 main policy keywords, with a sentence coverage rate of 95% for corresponding keywords in the entire lists. We constructed 24 classification labels for the control measures and industries, and applied and compared the classification recognition effects of BERT, RoBERTa, and ALBERT deep learning models on policy texts. The highest prediction accuracy, recall rate, and F1 score could reach 0.95, 0.79, and 0.86, respectively. The trained models could effectively identify the access control contents. It was found that there were still deficiencies in the clear, specific, and quantitative control measures of LEPs in Hebei Province, and the contents of refined control, assessment indicators, and time limits needed to be supplemented and refined. The method proposed in this study had good applicability prospects. It was recommended to combine cutting-edge artificial intelligence on this basis to further improve the model’s automatic processing efficiency, dynamic analysis, and ability to provide refined policy adjustment suggestions.

     

/

返回文章
返回