环境监测中异常数据识别与修复

Identification and recovery of abnormal data in environmental monitoring

  • 摘要: 为获取完整、可靠的环境监测数据,提出一种基于GeoHash算法的局部离群因子算法(GeoHash-LOF)。相较于传统的局部离群因子算法(LOF),GeoHash-LOF算法引入了地址划分和区域编码的思想,降低了算法计算量。针对识别出来的异常数据,采用基于遗传算法改进的灰色预测(GA-GM)算法进行修复,通过对灰色预测中的背景值和初值进行择优,从而提高预测值的准确度。以欧洲核能机构所提供的数据为例,将本文所提出的GeoHash-LOF算法、GA-GM算法与其他算法进行比较,结果表明本文所提出的算法异常数据识别效率更高且缺失数据修复拟合度更好。

     

    Abstract: A local outlier factor algorithm based on GeoHash approach (GeoHash-LOF) was proposed to obtain comprehensive and reliable environmental monitoring data. Compared to the traditional LOF algorithm, GeoHash-LOF introduced the concepts of address partitioning and region encoding, significantly reducing computational overhead. Identified outlier data was repaired using Genetic Algorithm-improved Grey Model (GA-GM) prediction technique. By optimizing the background value and initial value in the grey prediction model, the accuracy of prediction was enhanced. Taking the data provided by European Nuclear Energy Agency (ENEA) as an example, the proposed GeoHash-LOF algorithm and GA-GM technique were compared with other algorithms. The results demonstrated that the proposed algorithms exhibited higher efficiency in identifying anomaly data and achieved better fit in missing data restoration.

     

/

返回文章
返回