Hybridization of Air Quality Forecasting Models Using Machine Learning and Clustering: An Original Approach to Detect Pollutant Peaks

Creative Commons License

Tamas W., Notton G., Paoli C., Nivet M., Voyant C.

AEROSOL AND AIR QUALITY RESEARCH, vol.16, no.2, pp.405-416, 2016 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 16 Issue: 2
  • Publication Date: 2016
  • Doi Number: 10.4209/aaqr.2015.03.0193
  • Page Numbers: pp.405-416


This paper presents an original approach combining Artificial Neural Networks (ANNs) and clustering in order to detect pollutant peaks. We developed air quality forecasting models using machine learning methods applied to hourly concentrations of ozone (O-3), nitrogen dioxide (NO2) and particulate matter (PM10) 24 hours ahead. MultiLayer Perceptron (MLP) was used alone, then hybridized successively with hierarchical clustering and with a combination of self-organizing map and k-means clustering. Clustering methods were used to subdivide the dataset, and then an MLP was trained on each subset. Two urban sites of Corsica Island in the western Mediterranean Sea were investigated. These models showed a good global precision (Index of Agreement reaching 0.87 for O-3, 0.80 for NO2 and 0.74 for PM10). Considering it is particularly important than forecasting model used on an operational basis correctly predict pollution peaks, a sensitivity analysis was performed using Receiver Operating Characteristic curves (ROC curves). It allowed to evaluate the behaviour and the robustness of the models for high concentration situations. The results show that for PM10 and O-3, hybrid models made of a combination of clustering and MLP outperform classical MLP most of the time for high concentration prediction. An operational tool has been built with the models presented in this paper, and is used for air quality forecasting in Corsica.