Comparisons in drinking water systems using K-means and A-Priori to find pathogenic bacteria genera

Tevin Moodley, Dustin van der Haar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)


As water resources have become limited, there have been increased cases in illnesses related to waterborne pathogens, with this is mind studies and investigation needs to be done on alternative water sources such as, ground water and common water sources such as surface waters, to ensure that water provided to consumers are safe to consume. This research paper compares bacterial genera in both ground and surface source waters for drinking water systems, based on 16S rRNA profiling using machine learning methods, such as K-means and A priori. 16S can be used to identify and differentiate between bacterial genera. Not only is it important to identify specific bacterial genera found in water sources, but the relative abundance needs to be examined to determine whether groundwater is a more viable drinking water source than surface water. Using recent incidences of water-borne illnesses that have been reported across South Africa, five key bacterial indicators to determine water quality and safety can be identified, which can be found in both groundwater and surface waters. Captured data from samples collected is used to determine the abundance of each bacterium for each water sample in a more efficient and effective manner the five indicators outlined for this project are; E. coli (Escherichia), Legionella, Hemophilia, Bdellovibrio, Streptococcus. The dataset, used contained bacterium from both ground and surface waters using dimensional techniques and many parameters can be reduced for more efficient processing. The algorithms used include K-Means to cluster the data to allow for interpretation, A Priori algorithm to get the frequent items so that association rules can be derived, which allows patterns to be realized and SVM (support vector machine) to predict the error of new data coming into a stream. Using the results produced by the algorithms, it was discovered that the mean relative abundance of the pathogenic organisms found in groundwater was higher than that found in surface water. Results indicated that automated, scalable water viability assessment is feasible using the methods proposed, which make it an attractive avenue of research as the Internet of Things (IoT) in this domain develops.

Original languageEnglish
Title of host publicationInformation Science and Applications 2018 - ICISA 2018
EditorsKuinam J. Kim, Kuinam J. Kim, Nakhoon Baek
PublisherSpringer Verlag
Number of pages9
ISBN (Print)9789811310553
Publication statusPublished - 2019
EventInternational Conference on Information Science and Applications, ICISA 2018 - Kowloon, Hong Kong
Duration: 25 Jun 201827 Jun 2018

Publication series

NameLecture Notes in Electrical Engineering
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119


ConferenceInternational Conference on Information Science and Applications, ICISA 2018
Country/TerritoryHong Kong


  • A priori
  • Bacteria
  • Hadoop
  • K-means
  • PCA
  • SVM
  • Water assessment

ASJC Scopus subject areas

  • Industrial and Manufacturing Engineering


Dive into the research topics of 'Comparisons in drinking water systems using K-means and A-Priori to find pathogenic bacteria genera'. Together they form a unique fingerprint.

Cite this