K-means-Lite++: The combined advantage of sampling and seeding

Peter O. Olukanmi, Fulufhelo Nelwamondo, Tshilidzi Marwala

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

The k-means-lite algorithm was recently introduced to address the poor scalability of the popular k-means algorithm. This paper modifies k-means-lite by applying D2 sampling, the state-of-the-art seeding technique employed by k-means++, to each sample in the former. K-means-lite estimates its final centroids by applying standard k-means to the combination of all centroids obtained from initial application of k-means to a few (say, five) samples. Although k-means-lite achieves drastic efficiency gain, the five-sample implementation trades off some of the standard algorithm's accuracy. One way to improve its accuracy is to increase the number of samples. But it may be difficult to determine a priori what performance level is attainable, or the number of samples required to attain such performance. These would vary per problem. In this paper, we show experimentally that without increasing the sample size or number of samples in k-means-lite, our modified algorithm, named k-means-lite++, addresses this problem and is even more accurate than the standard k-means. For the cases tested, k-means-lite++ matches the accuracy of k-means++, which is widely considered the state-of-the-art k-means algorithm. The extra cost of sample seeding is compensated for by faster convergence. So, although drastically more accurate, k-means-lite++ has efficiency that is comparable to k-means-lite's. Thus, the proposed algorithm may also be viewed as a solution to scalability issues associated with k-means++'s sequential sampling.

Original languageEnglish
Title of host publication2019 6th International Conference on Soft Computing and Machine Intelligence, ISCMI 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages223-227
Number of pages5
ISBN (Electronic)9781728145778
DOIs
Publication statusPublished - Nov 2019
Event6th International Conference on Soft Computing and Machine Intelligence, ISCMI 2019 - Johannesburg, South Africa
Duration: 19 Nov 201920 Nov 2019

Publication series

Name2019 6th International Conference on Soft Computing and Machine Intelligence, ISCMI 2019

Conference

Conference6th International Conference on Soft Computing and Machine Intelligence, ISCMI 2019
Country/TerritorySouth Africa
CityJohannesburg
Period19/11/1920/11/19

Keywords

  • Accurate
  • Clustering
  • Efficient
  • Initialization
  • K-means
  • K-means++
  • K-means-lite
  • Seeding

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Computational Mathematics
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'K-means-Lite++: The combined advantage of sampling and seeding'. Together they form a unique fingerprint.

Cite this