A Virtual Sensing Concept for Nitrogen and Phosphorus Monitoring Using Machine Learning Techniques

Thulane Paepae, Pitshou N. Bokoro, Kyandoghere Kyamakya

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


Harmful cyanobacterial bloom (HCB) is problematic for drinking water treatment, and some of its strains can produce toxins that significantly affect human health. To better control eutrophication and HCB, catchment managers need to continuously keep track of nitrogen (N) and phosphorus (P) in the water bodies. However, the high-frequency monitoring of these water quality indicators is not economical. In these cases, machine learning techniques may serve as viable alternatives since they can learn directly from the available surrogate data. In the present work, a random forest, extremely randomized trees (ET), extreme gradient boosting, k-nearest neighbors, a light gradient boosting machine, and bagging regressor-based virtual sensors were used to predict N and P in two catchments with contrasting land uses. The effect of data scaling and missing value imputation were also assessed, while the Shapley additive explanations were used to rank feature importance. A specification book, sensitivity analysis, and best practices for developing virtual sensors are discussed. Results show that ET, MinMax scaler, and a multivariate imputer were the best predictive model, scaler, and imputer, respectively. The highest predictive performance, reported in terms of R2, was 97% in the rural catchment and 82% in an urban catchment.

Original languageEnglish
Article number7338
Issue number19
Publication statusPublished - Oct 2022


  • accuracy benchmark
  • baseline model
  • data scaling
  • machine learning
  • missing values handling
  • soft-sensor
  • specification book
  • surrogate parameters
  • water quality monitoring

ASJC Scopus subject areas

  • Analytical Chemistry
  • Information Systems
  • Biochemistry
  • Atomic and Molecular Physics, and Optics
  • Instrumentation
  • Electrical and Electronic Engineering


Dive into the research topics of 'A Virtual Sensing Concept for Nitrogen and Phosphorus Monitoring Using Machine Learning Techniques'. Together they form a unique fingerprint.

Cite this