k-Means-MIND: comparing seeds without repeated k-means runs

Peter Olukanmi, Fulufhelo Nelwamondo, Tshilidzi Marwala

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

A key drawback of the popular k-means clustering algorithm is its susceptibility to local minima. This problem is often addressed by performing repeated runs of the algorithm, and choosing the best run afterward. The approach is effective but computationally expensive: it multiplies the running time proportional to the number of repeats. We observe, in this paper, that repeated k-means runs is equivalent to comparing different candidate initializations using the k-means objective function obtained after running the algorithm fully, as the fitness function. This observation implies that if it were possible to compare the initializations ab initio without depending on the full algorithm for judgment, then there will be no need for repeats. Unfortunately, this phenomenon has not been studied in the literature, to our knowledge. We pursue the development of such a technique for comparing initializations directly. We choose as the "best", the initialization that possesses the largest minimum inter-center distance (MIND). This proposed technique also serves as a general technique for optimizing k-means seeding algorithms. We demonstrate the concept with MIND-optimized versions of two popular algorithms: k-means and k-means++. Experiments on real-world and benchmark synthetic datasets as well as mathematical analysis establish drastic efficiency gains when compared to repeated k-means. Furthermore, our approach significantly improves the accuracy of the standard versions of the algorithms, and it is easy to implement.

Original languageEnglish
JournalNeural Computing and Applications
DOIs
Publication statusAccepted/In press - 2022

Keywords

  • Efficient
  • Initialization
  • Repeat
  • k-Means clustering

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'k-Means-MIND: comparing seeds without repeated k-means runs'. Together they form a unique fingerprint.

Cite this