Avoiding Local Minima in Overlap-Based Clustering by Random Swap

Main Article Content

Fei Wang
Le Li
Pasi Fränti

Abstract

Context: Cluster overlap has been recently introduced as an alternative objective function to the traditional sum-of-squared error (SSE) function used in k-means. It estimates the overlap by counting how many points are shared between the neighbor clusters. The overlap k-means was shown to provide more stable centroid locations than SSE, but at the cost of losing the dynamics of the k-means algorithm. Objective: In this paper, we address this drawback by adding a random swap step to the algorithm and introducing a new Overlap random swap. Method: It consists of three steps: (1) selective replacement of the centroids, (2) standard cluster assignment step, (3) weighted centroid calculation giving less weight to the overlapping points. Results: Experimental results confirm that ORS achieves competitive clustering accuracy, stable centroid locations, and improved boundary separation. The clustering accuracy (ACC) is about 95% and the centroid index (CI) is about 0.1. The algorithm converges within 3000 swap iterations. Conclusions: These results indicate that ORS helps avoid local minima and maintains the robustness of overlap-based clustering.

Article Details

Section

Research Articles

How to Cite

[1]
F. . Wang, L. . Li, and P. . Fränti, “Avoiding Local Minima in Overlap-Based Clustering by Random Swap”, Systems and Computing, vol. 2, no. 1, Jan. 2026, doi: 10.64409/sycom.v2.i1.22.

References

[1] P. Fränti, C. Cariou, Q. Zhao, “Cluster overlap as objective function”, CMC-Computers, Materials & Continua, 66534, 1-28, 2025. https://doi.org/10.32604/cmc.2025.066534

[2] P. Fränti , J. Kivijärvi, “Randomized local search algorithm for the clustering problem”, Pattern Analysis and Applications, 3 (4), 358-369, 2000. https://doi.org/10.1007/s100440070007

[3] P. Fränti, “Efficiency of random swap clustering”, Journal of Big Data, 5:13, 1-29, 2018. https://doi.org/10.1186/s40537-018-0122-y

[4] P. Fränti and S. Sieranoja, “K-means properties on six clustering benchmark datasets”, Applied Intelligence, 48 (12), 4743-4759, 2018. https://doi.org/10.1007/s10489-018-1238-7

[5] E. Forgy, “Cluster analysis of multivariate data: efficiency vs. interpretability of classification”, Biometrics, 21(3), 768–780, 1965.

[6] J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Berkeley symposium on Mathematical Statistics and Probability. 281-297, 1967.

[7] S.P. Lloyd, “Least squares quantization in PCM”, IEEE Trans. Information Theory, 28 (2), 129–137, 1982. https://doi.org/10.1109/TIT.1982.1056489

[8] P. Fränti, J. Kivijärvi, T. Kaukoranta , O. Nevalainen, “Genetic algorithms for large scale clustering problems”, The Computer Journal, 40 (9), 547-554, 1997. https://doi.org/10.1093/comjnl/40.9.547

[9] M.I. Malinen, R. Mariescu-Istodor, P. Fränti, “K-means*: clustering by gradual data transformation”, Pattern Recognition, 47 (10), 3376-3386, 2014. http://dx.doi.org/10.1016/j.patcog.2014.03.034

[10] B. Fritzke, “Breathing k-means”, arXiv preprint arXiv:2006.15666, 2020.

[11] J. Lee, D. Perkins, “A simulated annealing algorithm with a dual perturbation method for clustering”, Pattern Recognition, 112, 107713, 2021. https://doi.org/10.1016/j.patcog.2020.107713

[12] C. Baldassi, “Recombinator K-means: an evolutionary algorithm that exploits k-means++ for recombination”. IEEE Trans. on Evolutionary Computation, 26 (5), 991-1003, 2022. https://doi.org/10.1109/TEVC.2022.3144134

[13] P. Fränti, J. Kivijärvi, O. Nevalainen: “Tabu search algorithm for codebook generation in vector quantization”, Pattern Recognition, 31 (8), 1139-1148, 1998. https://doi.org/10.1016/S0031-3203(97)00127-1

[14] S. Sieranoja, P. Fränti, “Fast and general density peaks clustering”, Pattern Recognition Letters, 128, 551-558, 2019. https://doi.org/10.1016/j.patrec.2019.10.019

[15] L. Fu, E. Medico, “FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data”, BMC bioinformatics, 8(1), 3, 2007. https://doi.org/10.1186/1471-2105-8-3

[16] A. Gionis, H. Mannila, P. Tsaparas, “Clustering aggregation”, ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4, 2007. https://doi.org/10.1145/1217299.1217303

[17] I. Kärkkäinen, P. Fränti, “Dynamic local search algorithm for the clustering problem”, Research Reports, A-2002-6, University of Joensuu, 2002.

[18] M. Rezaei, P. Fränti, “Set-matching measures for external cluster validity”, IEEE Trans. on Knowledge and Data Engineering, 28 (8), 2173-2186, 2016. https://doi.org/10.1109/TKDE.2016.2551240

[19] T. Zhang, R. Ramakrishnan, M. Livny, “BIRCH: A new data clustering algorithm and its applications", Data Mining and Knowledge Discovery, 1 (2), 141-182, 1997. https://doi.org/10.1023/A:1009783824328

[20] P. Fränti, “Mopsi location-based service”, Applied Computing and Intelligence, 4 (2), 209-233, 2024. https://doi.org/10.3934/aci.2024013

[21] D. Arthur, S. Vassilvitskii, “K-means++: the advantages of careful seeding”, ACM-SIAM Symp. on Discrete Algorithms (SODA’07), January 2007. https://10.5555/1283383.1283494

[22] A. Rodriguez, A. Laio, “Clustering by fast search and find of density peaks”, Science, 344(6191), 1492-1496, 2014. https://doi.org/10.1126/science.124207

[23] E. Martin, H-P. Kriegel, J. Sander, X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise.” kdd. 96 (34), 1996. https://doi.org/10.5555/3001460.3001507

[24] P. Fränti, S. Sieranoja, “Clustering accuracy”, Applied Computing and Intelligence, 4 (1), 24-44, 2024. https://doi.org/10.3934/aci.2024003

[25] P. Fränti, M. Rezaei, Q. Zhao, “Centroid index: cluster level similarity measure”, Pattern Recognition, 47 (9), 3034-3045, 2014. http://dx.doi.org/10.1016/j.patcog.2014.03.017