Avoiding Local Minima in Overlap-Based Clustering by Random Swap
Main Article Content
Abstract
Context: Cluster overlap has been recently introduced as an alternative objective function to the traditional sum-of-squared error (SSE) function used in k-means. It estimates the overlap by counting how many points are shared between the neighbor clusters. The overlap k-means was shown to provide more stable centroid locations than SSE, but at the cost of losing the dynamics of the k-means algorithm. Objective: In this paper, we address this drawback by adding a random swap step to the algorithm and introducing a new Overlap random swap. Method: It consists of three steps: (1) selective replacement of the centroids, (2) standard cluster assignment step, (3) weighted centroid calculation giving less weight to the overlapping points. Results: Experimental results confirm that ORS achieves competitive clustering accuracy, stable centroid locations, and improved boundary separation. The clustering accuracy (ACC) is about 95% and the centroid index (CI) is about 0.1. The algorithm converges within 3000 swap iterations. Conclusions: These results indicate that ORS helps avoid local minima and maintains the robustness of overlap-based clustering.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.
Articles published in SyCom are open access and distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
How to Cite
References
[1] P. Fränti, C. Cariou, Q. Zhao, “Cluster overlap as objective function”, CMC-Computers, Materials & Continua, 66534, 1-28, 2025. https://doi.org/10.32604/cmc.2025.066534
[2] P. Fränti , J. Kivijärvi, “Randomized local search algorithm for the clustering problem”, Pattern Analysis and Applications, 3 (4), 358-369, 2000. https://doi.org/10.1007/s100440070007
[3] P. Fränti, “Efficiency of random swap clustering”, Journal of Big Data, 5:13, 1-29, 2018. https://doi.org/10.1186/s40537-018-0122-y
[4] P. Fränti and S. Sieranoja, “K-means properties on six clustering benchmark datasets”, Applied Intelligence, 48 (12), 4743-4759, 2018. https://doi.org/10.1007/s10489-018-1238-7
[5] E. Forgy, “Cluster analysis of multivariate data: efficiency vs. interpretability of classification”, Biometrics, 21(3), 768–780, 1965.
[6] J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Berkeley symposium on Mathematical Statistics and Probability. 281-297, 1967.
[7] S.P. Lloyd, “Least squares quantization in PCM”, IEEE Trans. Information Theory, 28 (2), 129–137, 1982. https://doi.org/10.1109/TIT.1982.1056489
[8] P. Fränti, J. Kivijärvi, T. Kaukoranta , O. Nevalainen, “Genetic algorithms for large scale clustering problems”, The Computer Journal, 40 (9), 547-554, 1997. https://doi.org/10.1093/comjnl/40.9.547
[9] M.I. Malinen, R. Mariescu-Istodor, P. Fränti, “K-means*: clustering by gradual data transformation”, Pattern Recognition, 47 (10), 3376-3386, 2014. http://dx.doi.org/10.1016/j.patcog.2014.03.034
[10] B. Fritzke, “Breathing k-means”, arXiv preprint arXiv:2006.15666, 2020.
[11] J. Lee, D. Perkins, “A simulated annealing algorithm with a dual perturbation method for clustering”, Pattern Recognition, 112, 107713, 2021. https://doi.org/10.1016/j.patcog.2020.107713
[12] C. Baldassi, “Recombinator K-means: an evolutionary algorithm that exploits k-means++ for recombination”. IEEE Trans. on Evolutionary Computation, 26 (5), 991-1003, 2022. https://doi.org/10.1109/TEVC.2022.3144134
[13] P. Fränti, J. Kivijärvi, O. Nevalainen: “Tabu search algorithm for codebook generation in vector quantization”, Pattern Recognition, 31 (8), 1139-1148, 1998. https://doi.org/10.1016/S0031-3203(97)00127-1
[14] S. Sieranoja, P. Fränti, “Fast and general density peaks clustering”, Pattern Recognition Letters, 128, 551-558, 2019. https://doi.org/10.1016/j.patrec.2019.10.019
[15] L. Fu, E. Medico, “FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data”, BMC bioinformatics, 8(1), 3, 2007. https://doi.org/10.1186/1471-2105-8-3
[16] A. Gionis, H. Mannila, P. Tsaparas, “Clustering aggregation”, ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1), 4, 2007. https://doi.org/10.1145/1217299.1217303
[17] I. Kärkkäinen, P. Fränti, “Dynamic local search algorithm for the clustering problem”, Research Reports, A-2002-6, University of Joensuu, 2002.
[18] M. Rezaei, P. Fränti, “Set-matching measures for external cluster validity”, IEEE Trans. on Knowledge and Data Engineering, 28 (8), 2173-2186, 2016. https://doi.org/10.1109/TKDE.2016.2551240
[19] T. Zhang, R. Ramakrishnan, M. Livny, “BIRCH: A new data clustering algorithm and its applications", Data Mining and Knowledge Discovery, 1 (2), 141-182, 1997. https://doi.org/10.1023/A:1009783824328
[20] P. Fränti, “Mopsi location-based service”, Applied Computing and Intelligence, 4 (2), 209-233, 2024. https://doi.org/10.3934/aci.2024013
[21] D. Arthur, S. Vassilvitskii, “K-means++: the advantages of careful seeding”, ACM-SIAM Symp. on Discrete Algorithms (SODA’07), January 2007. https://10.5555/1283383.1283494
[22] A. Rodriguez, A. Laio, “Clustering by fast search and find of density peaks”, Science, 344(6191), 1492-1496, 2014. https://doi.org/10.1126/science.124207
[23] E. Martin, H-P. Kriegel, J. Sander, X. Xu. “A density-based algorithm for discovering clusters in large spatial databases with noise.” kdd. 96 (34), 1996. https://doi.org/10.5555/3001460.3001507
[24] P. Fränti, S. Sieranoja, “Clustering accuracy”, Applied Computing and Intelligence, 4 (1), 24-44, 2024. https://doi.org/10.3934/aci.2024003
[25] P. Fränti, M. Rezaei, Q. Zhao, “Centroid index: cluster level similarity measure”, Pattern Recognition, 47 (9), 3034-3045, 2014. http://dx.doi.org/10.1016/j.patcog.2014.03.017