TY - GEN
T1 - Comparisons in drinking water systems using K-means and A-Priori to find pathogenic bacteria genera
AU - Moodley, Tevin
AU - van der Haar, Dustin
N1 - Publisher Copyright:
© 2019, Springer Nature Singapore Pte Ltd.
PY - 2019
Y1 - 2019
N2 - As water resources have become limited, there have been increased cases in illnesses related to waterborne pathogens, with this is mind studies and investigation needs to be done on alternative water sources such as, ground water and common water sources such as surface waters, to ensure that water provided to consumers are safe to consume. This research paper compares bacterial genera in both ground and surface source waters for drinking water systems, based on 16S rRNA profiling using machine learning methods, such as K-means and A priori. 16S can be used to identify and differentiate between bacterial genera. Not only is it important to identify specific bacterial genera found in water sources, but the relative abundance needs to be examined to determine whether groundwater is a more viable drinking water source than surface water. Using recent incidences of water-borne illnesses that have been reported across South Africa, five key bacterial indicators to determine water quality and safety can be identified, which can be found in both groundwater and surface waters. Captured data from samples collected is used to determine the abundance of each bacterium for each water sample in a more efficient and effective manner the five indicators outlined for this project are; E. coli (Escherichia), Legionella, Hemophilia, Bdellovibrio, Streptococcus. The dataset, used contained bacterium from both ground and surface waters using dimensional techniques and many parameters can be reduced for more efficient processing. The algorithms used include K-Means to cluster the data to allow for interpretation, A Priori algorithm to get the frequent items so that association rules can be derived, which allows patterns to be realized and SVM (support vector machine) to predict the error of new data coming into a stream. Using the results produced by the algorithms, it was discovered that the mean relative abundance of the pathogenic organisms found in groundwater was higher than that found in surface water. Results indicated that automated, scalable water viability assessment is feasible using the methods proposed, which make it an attractive avenue of research as the Internet of Things (IoT) in this domain develops.
AB - As water resources have become limited, there have been increased cases in illnesses related to waterborne pathogens, with this is mind studies and investigation needs to be done on alternative water sources such as, ground water and common water sources such as surface waters, to ensure that water provided to consumers are safe to consume. This research paper compares bacterial genera in both ground and surface source waters for drinking water systems, based on 16S rRNA profiling using machine learning methods, such as K-means and A priori. 16S can be used to identify and differentiate between bacterial genera. Not only is it important to identify specific bacterial genera found in water sources, but the relative abundance needs to be examined to determine whether groundwater is a more viable drinking water source than surface water. Using recent incidences of water-borne illnesses that have been reported across South Africa, five key bacterial indicators to determine water quality and safety can be identified, which can be found in both groundwater and surface waters. Captured data from samples collected is used to determine the abundance of each bacterium for each water sample in a more efficient and effective manner the five indicators outlined for this project are; E. coli (Escherichia), Legionella, Hemophilia, Bdellovibrio, Streptococcus. The dataset, used contained bacterium from both ground and surface waters using dimensional techniques and many parameters can be reduced for more efficient processing. The algorithms used include K-Means to cluster the data to allow for interpretation, A Priori algorithm to get the frequent items so that association rules can be derived, which allows patterns to be realized and SVM (support vector machine) to predict the error of new data coming into a stream. Using the results produced by the algorithms, it was discovered that the mean relative abundance of the pathogenic organisms found in groundwater was higher than that found in surface water. Results indicated that automated, scalable water viability assessment is feasible using the methods proposed, which make it an attractive avenue of research as the Internet of Things (IoT) in this domain develops.
KW - A priori
KW - Bacteria
KW - Hadoop
KW - K-means
KW - PCA
KW - SVM
KW - Water assessment
UR - http://www.scopus.com/inward/record.url?scp=85051082351&partnerID=8YFLogxK
U2 - 10.1007/978-981-13-1056-0_36
DO - 10.1007/978-981-13-1056-0_36
M3 - Conference contribution
AN - SCOPUS:85051082351
SN - 9789811310553
T3 - Lecture Notes in Electrical Engineering
SP - 351
EP - 359
BT - Information Science and Applications 2018 - ICISA 2018
A2 - Kim, Kuinam J.
A2 - Kim, Kuinam J.
A2 - Baek, Nakhoon
PB - Springer Verlag
T2 - International Conference on Information Science and Applications, ICISA 2018
Y2 - 25 June 2018 through 27 June 2018
ER -