Identification and recovery of abnormal data in environmental monitoring
-
摘要:
为获取完整、可靠的环境监测数据,提出一种基于GeoHash算法的局部离群因子算法(GeoHash-LOF)。相较于传统的局部离群因子算法(LOF),GeoHash-LOF算法引入了地址划分和区域编码的思想,降低了算法计算量。针对识别出来的异常数据,采用基于遗传算法改进的灰色预测(GA-GM)算法进行修复,通过对灰色预测中的背景值和初值进行择优,从而提高预测值的准确度。以欧洲核能机构所提供的数据为例,将本文所提出的GeoHash-LOF算法、GA-GM算法与其他算法进行比较,结果表明本文所提出的算法异常数据识别效率更高且缺失数据修复拟合度更好。
-
关键词:
- 环境监测 /
- 数据修复 /
- GeoHash-LOF算法 /
- GA-GM算法
Abstract:A local outlier factor algorithm based on GeoHash approach (GeoHash-LOF) was proposed to obtain comprehensive and reliable environmental monitoring data. Compared to the traditional LOF algorithm, GeoHash-LOF introduced the concepts of address partitioning and region encoding, significantly reducing computational overhead. Identified outlier data was repaired using Genetic Algorithm-improved Grey Model (GA-GM) prediction technique. By optimizing the background value and initial value in the grey prediction model, the accuracy of prediction was enhanced. Taking the data provided by European Nuclear Energy Agency (ENEA) as an example, the proposed GeoHash-LOF algorithm and GA-GM technique were compared with other algorithms. The results demonstrated that the proposed algorithms exhibited higher efficiency in identifying anomaly data and achieved better fit in missing data restoration.
-
Key words:
- environmental monitoring /
- data recovery /
- GeoHash-LOF algorithm /
- GA-GM algorithm
-
表 1 不同编码长度下单元格平均数据量
Table 1. Average data volume of cells with different coding lengths
编码长度 字节/bits 区域数量 单位区域内平均数据量 1 5 $ {2^5} $ N/$ {2^5} $ 2 10 $ {2^{10}} $ N/$ {2^{10}} $ 3 15 $ {2^{15}} $ N/$ {2^{15}} $ 4 20 $ {2^{20}} $ N/$ {2^{20}} $ 表 2 原始数据
Table 2. Raw data
序号 1 2 3 4 5 6 7 8 绝对湿度/(kg/m3) 40.8 41.5 39.9 39.5 32.7 24.9 19.8 19.3 表 3 不同算法比较结果
Table 3. Comparison results of different algorithms
实际
数据GA-GM算法 GGWO-GM算法[15] 改进GM算法[16] 预测数据 误差/% 预测数据 误差/% 预测数据 误差/% 40.8 40.80 40.80 40.80 41.5 47.26 13.88 43.81 5.58 42.32 1.99 39.9 38.53 −3.43 38.33 −3.93 37.30 −6.51 39.5 35.95 −8.98 33.52 −15.12 32.87 −16.77 32.7 32.63 −0.30 29.33 −10.31 28.97 −11.39 24.9 27.06 8.67 25.65 3.04 25.53 2.57 19.8 20.82 5.15 22.44 13.35 22.50 13.68 19.3 18.28 −5.28 19.63 1.72 19.83 2.79 -
[1] 李信茹, 周民, 米屹东, 等. 智慧环保体系在环境治理中的应用[J]. 环境工程技术学报,2021,11(5):992-1003.LI X R, ZHOU M, MI Y D, et al. Application of smart environmental protection system in environmental management[J]. Journal of Environmental Engineering Technology,2021,11(5):992-1003. [2] 车元鸿, 魏张东. 计算机在环境监测中的应用探讨[J]. 环境工程,2022,40(4):273-274.CHE Y H, WEI Z D. Application of computer in environmental monitoring[J]. Environmental Engineering,2022,40(4):273-274. [3] WANG H, ZHANG N, DU E S, et al. An adaptive identification method of abnormal data in wind and solar power stations[J]. Renewable Energy,2023,208:76-93. doi: 10.1016/j.renene.2023.03.081 [4] BIRANT D, KUT A. Spatio-temporal outlier detection in large databases[J]. Journal of Computing and Information Technology,2006,14(4):291. doi: 10.2498/cit.2006.04.04 [5] 杨风召, 朱扬勇, 施伯乐. IncLOF: 动态环境下局部异常的增量挖掘算法[J]. 计算机研究与发展,2004,41(3):477-484.YANG F Z, ZHU Y Y, SHI B L. IncLOF: an incremental algorithm for mining local outliers in dynamic environment[J]. Journal of Computer Research and Development,2004,41(3):477-484. [6] 鲁树武, 伍小龙, 郑江, 等. 基于动态融合LOF的城市污水处理过程数据清洗方法[J]. 控制与决策,2022,37(5):1231-1240.LU S W, WU X L, ZHENG J, et al. Data-cleaning method based on dynamic fusion LOF for municipal wastewater treatment process[J]. Control and Decision,2022,37(5):1231-1240. [7] 金安, 程承旗, 宋树华, 等. 基于Geohash的面数据区域查询[J]. 地理与地理信息科学,2013,29(5):31-35. [8] 涂国庆, 杨延浩, 刘树波. Geohash编码抗k近邻攻击的脆弱性分析[J]. 信息网络安全,2021,21(2):10-15.TU G Q, YANG Y H, LIU S B. Vulnerability analysis of geohash code against k-nearest neighbor attack[J]. Netinfo Security,2021,21(2):10-15. [9] 陈志, 俞炳丰, 胡汪洋, 等. 城市热岛效应的灰色评价与预测[J]. 西安交通大学学报,2004,38(9):985-988. doi: 10.3321/j.issn:0253-987X.2004.09.025CHEN Z, YU B F, HU W Y, et al. Grey assessment and prediction of the urban heat island effect in city[J]. Journal of Xi'an Jiaotong University,2004,38(9):985-988. doi: 10.3321/j.issn:0253-987X.2004.09.025 [10] XU N, DANG Y G, CUI J. Comprehensive optimized GM(1, 1) model and application for short term forecasting of Chinese energy consumption and production[J]. Journal of Systems Engineering and Electronics,2015,26:794-801. [11] WANG Y H, LU J. Improvement and application of GM(1, 1) model based on multivariable dynamic optimization[J]. Journal of Systems Engineering and Electronics,2020,31(3):593-601. doi: 10.23919/JSEE.2020.000024 [12] 曹爱虎, 陈凯, 李义敬, 等. 基于改进的灰色模型的瓦斯涌出量预测研究[J]. 煤炭科技,2011(2):4-7.CAO A H, CHEN K, LI Y J, et al. Forecasting of gas emission based on the improved grey model[J]. Coal Science & Technology Magazine,2011(2):4-7. [13] 靳文博, 秦大鹏, 孙辰, 等. 基于改进GM(1, 1)模型的管壁结蜡厚度增长规律研究[J]. 安全与环境学报,2021,21(6):2563-2570.JIN W B, QIN D P, SUN C, et al. Study on the growth law of wax deposition thickness on pipe wall based on improved GM(1, 1) model[J]. Journal of Safety and Environment,2021,21(6):2563-2570. [14] 吴永强, 李明凯, 唐中楠, 等. 基于灰色动态模型群的衡水市居民年用水量预测[J]. 环境工程技术学报,2022,12(1):267-274.WU Y Q, LI M K, TANG Z N, et al. Projection of residential annual water consumption in Hengshui City based on dynamic gray model groups[J]. Journal of Environmental Engineering Technology,2022,12(1):267-274. [15] 张英芝, 朱继微, 刘津彤, 等. 改进灰狼算法优化灰色预测模型在数控机床中的应用[J]. 制造技术与机床,2022(3):127-131.ZHANG Y Z, ZHU J W, LIU J T, et al. Application of improved gray wolf algorithm to optimize gray forecasting model in CNC machine tools[J]. Manufacturing Technology & Machine Tool,2022(3):127-131. [16] 张大海, 江世芳, 史开泉. 灰色预测公式的理论缺陷及改进[J]. 系统工程理论与实践,2002,22(8):140-142.ZHANG D H, JIANG S F, SHI K Q. Theoretical defect of grey prediction formula and its improvement[J]. Systems Engineering-theory & Practice,2002,22(8):140-142. [17] 余峰, 王珂佳, 张文龙, 等. 基于遗传算法优化BP神经网络的水生态修复原位控浊混凝投药预测[J]. 环境工程,2023,41(4):154-163.4-163.YU F, WANG K J, ZHANG W L, et al. Prediction of coagulant dosage for in situ turbidity control in water ecological restoration based on bp neural network optimized by genetic algorithm[J]. Environmental Engineering,2023,41(4):154-163. [18] 李先, 张振, 周玉龙, 等. 基于遗传算法的航空航天环锻件混合流水车间调度优化[J]. 锻压技术,2023,48(11):196-203.LI X, ZHANG Z, ZHOU Y L, et al. Scheduling optimization on hybrid flow workshop for aerospace ring forgings based on genetic algorithm[J]. Forging & Stamping Technology,2023,48(11):196-203. ⊗