Volume 11 Issue 2
Mar.  2021
Turn off MathJax
Article Contents
WANG Zhijun, CHANG Miao, ZHOU Li, GUO Peikun, GU Meifeng. Development of environmental management lexicon based on new word discovery and its empirical application[J]. Journal of Environmental Engineering Technology, 2021, 11(2): 385-392. doi: 10.12153/j.issn.1674-991X.20200127
Citation: WANG Zhijun, CHANG Miao, ZHOU Li, GUO Peikun, GU Meifeng. Development of environmental management lexicon based on new word discovery and its empirical application[J]. Journal of Environmental Engineering Technology, 2021, 11(2): 385-392. doi: 10.12153/j.issn.1674-991X.20200127

Development of environmental management lexicon based on new word discovery and its empirical application

doi: 10.12153/j.issn.1674-991X.20200127
More Information
  • Corresponding author: CHANG Miao E-mail: changmiao@tsinghua.edu.cn
  • Received Date: 2020-05-22
  • Publish Date: 2021-03-20
  • With the rapid development of environmental policies in China, collating, inducing, analyzing and interpreting a large number of policies and regulations in a purely manual way has become more and more difficult. Therefore, it is of great significance to use computer technologies, such as text mining, to support intelligent environmental policy management and environmental policy analysis, including information extraction and text analysis. Accurate word segmentation, or tokenization, is the basis of all text mining functions. In order to improve the effect of policy text segmentation, the environmental policies published on official websites of China?s ecological and environmental departments of all levels were collected and taken as corpus. New word discovery algorithms and manual supplement and modification were adopted to develop the environmental management professional lexicon. The empirical results showed that with addition of the environmental lexicon, the accuracy of environmental policy segmentation could improve from 72.6% to 94.1%, and the misjudgment rate of policy automatic classification based on support vector machine could reduce by 22.7%. Besides, the results of word frequency statistics and keyword extraction after adding lexicon could also provide more comprehensive and more timely statistical information for environmental policy analysis.

     

  • loading
  • [1]
    许阳, 王琪, 孔德意. 我国海洋环境保护政策的历史演进与结构特征:基于政策文本的量化分析[J]. 上海行政学院学报, 2016,17(4):81-91.

    XU Y, WANG Q, KONG D Y. Research on historical evolutions and structural features of Chinese marine environment policy:quantitative analysis based on policy content[J]. The Journal of Shanghai Administration Institute, 2016,17(4):81-91.
    [2]
    杨志军, 耿旭, 王若雪. 环境治理政策的工具偏好与路径优化:基于43个政策文本的内容分析[J]. 东北大学学报(社会科学版), 2017,19(3):276-283.

    YANG Z J, GENG X, WANG R X. Tool preference and path optimization of environmental governance policies:based on the content analysis of 43 policy texts[J]. Journal of Northeastern University(Social Science), 2017,19(3):276-283.
    [3]
    LIAO Z J. Content analysis of China’s environmental policy instruments on promoting firms’ environmental innovation[J]. Environmental Science & Policy, 2018,88:46-51.
    [4]
    RIVERA S, MINSKER B S, WORK D B, et al. A text mining framework for advancing sustainability indicators[J]. Environmental Modelling and Software, 2014,62:128-138.
    [5]
    BOUSSALIS C, COAN T. Text-mining the signals of climate change doubt[J]. Global Environmental Change-human and Policy Dimensions, 2016,36:89-100.
    [6]
    徐一方, 许鑫, 张秀敏. 基于词频计算原理的环境政策分析与评价[J]. 中国科技论坛, 2014(7):37-43.

    XU Y F, XU X, ZHANG X M. Analysis and evaluation of environmental policy based on wordscore theory[J]. Forum on Science and Technology in China, 2014(7):37-43.
    [7]
    张卉, 张捷. 基于环境保护视角的村镇建设政策内容变迁研究[J]. 环境科学与管理, 2018,43(7):1-4.

    ZHANG H, ZHANG J. Qualitative analysis of policy documents on village and town construction in China:based on environment protection[J]. Environmental Science and Management, 2018,43(7):1-4.
    [8]
    李文坤, 张仰森, 陈若愚. 基于词内部结合度和边界自由度的新词发现[J]. 计算机应用研究, 2015,32(8):2302-2304.

    LI W K, ZHANG Y S, CHEN R Y. New word detection based on inner combination degree and boundary freedom degree of word[J]. Application Research of Computers, 2015,32(8):2302-2304.
    [9]
    刘伟童, 刘培玉, 刘文锋, 等. 基于互信息和邻接熵的新词发现算法[J]. 计算机应用研究, 2019,36(5):1293-1296.

    LIU W T, LIU P Y, LIU W F, et al. New word discovery algorithm based on mutual information and branch entropy[J]. Application Research of Computers, 2019,36(5):1293-1296.
    [10]
    陈先来, 韩超鹏, 安莹, 等. 基于互信息和逻辑回归的新词发现[J]. 数据分析与知识发现, 2019(8):105-113.

    CHEN X L, HAN C P, AN Y, et al. Extracting new words with mutual information and logistic regression[J]. Data Analysis and Knowledge Discovery, 2019(8):105-113.
    [11]
    郭理, 张恒旭, 王嘉岐, 等. 基于Trie树的词语左右熵和互信息新词发现算法[J]. 现代电子技术, 2020,43(6):65-69.

    GUO L, ZHANG H X, WANG J Q, et al. Trie tree based new word discovery algorithm using left-right entropy and mutual information[J]. Modern Electronics Technique, 2020,43(6):65-69.
    [12]
    苏剑林. 速度更快、效果更好的中文新词发现[EB/OL]. (2019-12-04)[2020-05-16]. https://github.com/bojone/word-discovery.
    [13]
    LAM J C, CHEUNG L Y, WANG S, et al. Stakeholder concerns of air pollution in Hong Kong and policy implications:a big-data computational text analysis approach[J]. Environmental Science & Policy, 2019,101:374-382.
    [14]
    尤众喜, 华薇娜, 潘雪莲. 中文分词器对图书评论和情感词典匹配程度的影响[J]. 数据分析与知识发现, 2019(7):23-33.

    YOU Z X, HUA W N, PAN X L. Matching book reviews and essential sentiment lexicons with Chinese word segmenters[J]. Data Analysis and Knowledge Discovery, 2019(7):23-33.
    [15]
    生态环境部办公厅. 关于做好新型冠状病毒感染的肺炎疫情医疗污水和城镇污水监管工作的通知:环办水体函〔2020〕52号[A/OL]. (2020-02-02)[2020-05-17]. http://www.gov.cn/zhengce/zhengceku/2020-02/02/content_5473898.htm.
    [16]
    SILVA C, RIBEIRO B. The importance of stop word removal on recall values in text categorization[C]// Proceedings of the International Joint Conference on Neural Networks, 2003:1661-1666.
    [17]
    ONAN A, KORUKOGLU S, BULUT H, et al. Ensemble of keyword extraction methods and classifiers in text classification[J]. Expert Systems with Applications, 2016,57:232-247.
    doi: 10.1016/j.eswa.2016.03.045
    [18]
    郑石明, 彭芮, 高灿玉. 中国环境政策变迁逻辑与展望:基于共词与聚类分析[J]. 吉首大学学报(社会科学版), 2019,40(2):7-20.

    ZHENG S M, PENG R, GAO C Y. The logic of change and prospect of environmental policy of China:based on co-word and cluster analysis[J]. Journal of Jishou University(Social Sciences), 2019,40(2):7-20.
    [19]
    叶娟丽, 韩瑞波, 王亚茹. 我国环境治理政策的研究路径与演变规律分析:基于CNKI论文的文献计量分析[J]. 吉首大学学报(社会科学版), 2018,39(5):76-83.

    YE J L, HAN R B, WANG Y R. Analysis on the research path and evolution law of domestic environmental governance policy:a literature metrological analysis based on CNKI papers[J]. Journal of Jishou University(Social Sciences), 2018,39(5):76-83.
    [20]
    SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing and Management, 1988,24(5):323-328.
    [21]
    叶雪梅, 毛雪岷, 夏锦春, 等. 文本分类TF-IDF算法的改进研究[J]. 计算机工程与应用, 2019,55(2):104-109.

    YE X M, MAO X M, XIA J C, et al. Improved approach to TF-IDF algorithm in text classification[J]. Computer Engineering and Applications, 2019,55(2):104-109.
    [22]
    PRANCKEVICIUS T MARCINKEVIČIUS V, Comparison of naive bayes,random forest,decision tree,support vector machines,and logistic regression classifiers for text reviews classification [J]. Baltic J ournal of Modern Computing, 2017,5(2):221-232.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article Views(347) PDF Downloads(110) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return