Ir al contenido

Documat


A bisection method for solving distance-based clustering problems globally

  • Peter Kirst [2] ; Tomáš Bajbar [1] ; Mario Merkel [1]
    1. [1] Karlsruhe Institute of Technology

      Karlsruhe Institute of Technology

      Stadtkreis Karlsruhe, Alemania

    2. [2] Wageningen University and Research, Wageningen, The Netherlands
  • Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 33, Nº. 3, 2025, págs. 437-469
  • Idioma: inglés
  • DOI: 10.1007/s11750-024-00684-w
  • Enlaces
  • Resumen
    • In this article, we consider distance-based clustering problems. In contrast to many approaches, we use the maximum norm instead of the more commonly used Euclidean norm to measure distances. This problem is nonsmooth and non-convex and, thus, difficult to solve to global optimality using standard approaches, which is common in cluster analysis. Therefore, we reformulate this continuous problem in light of graph-theoretical instances which enables us to construct a bisection algorithm converging to the globally minimal value of the original clustering problem by establishing valid upper and lower bounding procedures. Our numerical results indicate that our method performs well on data sets exhibiting clear cluster-pattern structure even for bigger data instances while still guaranteeing the global optimality of the computed solution. We compare our approach with the classical k-means algorithm and also discuss the limits and challenges of the proposed procedure.

  • Referencias bibliográficas
    • Akkoyunlu EA (1973) The enumeration of maximal cliques of large graphs. SIAM J Comput 2(1):1–6
    • Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-ofsquares clustering. Math Programm 131(1):131–220
    • Babaki B, Guns T, Nijssen S (2014) Constrained clustering using column generation. In: International Conference on Integration of Constraint...
    • Bagirov AM (2008) Modified global k-means algorithm for minimum sum-of-squares clustering problems. Pattern Recogn 41(10):3192–3199
    • Bagirov AM, Rubinov AM (2024) Modified versions of the cutting angle method. In: Hadjisavvas and Pardalos PM (eds) Advances in convex analysis...
    • Bagirov AM, Rubinov AM, Soukhoroukova NV, Yearwood J (2003) Unsupervised and supervised data classification via nonsmooth and global optimization....
    • Bagirov AM, Ugon J, Webb D (2011) Fast modified global k-means algorithm for incremental cluster construction. Pattern Recogn 44(4):866–876
    • Balas E, Padberg MW (1972) On the set-covering problem. Oper Res 20(6):1152–1161
    • Ben-Dor A, Shamir R, Yakhini Z (1999) Clustering gene expression patterns. J Comput Biol 6(3–4):281–297
    • Bondy JA, Murty U, Silva R (1976) Graph theory with applications. Macmillan, London
    • Brigham RC, Dutton RD (1983) On clique covers and independence numbers of graphs. Disc Math 44(2):139–144
    • Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577
    • Brusco MJ (2003) An enhanced branch-and-bound algorithm for a partitioning problem. Br J Math Stat Psychol 56(1):83–92
    • Brusco MJ (2006) A repetitive branch-and-bound procedure for minimum within sums of squares partitioning. Psychometrika 71(2):347–363
    • Brusco MJ, Cradit D (2004) Graph coloring, minimum-diameter partitioning, and the analysis of confusion matrices. J Math Psychol 48(5):301–309
    • Diestel R (2000) Graph theory. Springer, New York
    • Dorndorf U, Pesch E (1994) Fast clustering algorithms. ORSA J Comput 6(2):141–153
    • Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In:...
    • Fränti P, Virmajoki O, Kaukoranta T (2002) Branch-and-bound technique for solving optimal clustering. Int Conf Pattern Recogn 2:232–235
    • Gary Augustson J, Minker J (1970) An analysis of some graph theoretical cluster techniques. J ACM 17:571–588
    • Gramm J, Guo J, Hüffner F, Niedermeier R (2005) Graph-modeled data clustering: fixed-parameter algorithms for clique generation. Theory Comput...
    • Gramm J, Guo J, Hüffner F, Niedermeier R (2009) Data reduction and exact algorithms for clique cover. J Exp Algorithmics 13(2)
    • Gurobi Optimization (2024) LLC. In: Gurobi optimizer reference manual. https://www.gurobi.com
    • Hand DJ (1981) Branch-and-bound in statistical data analysis. The Statistician 30(1):1–13
    • Hanif D (1999) Sherali and Warren P. Springer, Adams. A reformulation-linearization technique for solving discrete and continuous nonconvex...
    • Hansen P, Delattre M (1978) Complete-link cluster analysis by graph coloring. J Am Stat Assoc 73(362):397–403
    • Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Programm 79(1):191– 215
    • Hartigan JA (1975) Clustering algorithms. Wiley, New York
    • Hartigan JA, Wong MA (1979) Algorithm AS 136: a K-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
    • Hartuv E, Shamir R (2000) A clustering algorithm based on graph connectivity. Inform Process Lett 76(4– 6):175–181
    • Huang GB, Ramesh M, Berg T, Learned-Miller E (2000) Labeled Faces in the wild: a database for studying face recognition in unconstrained environments....
    • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
    • Jensen RE (1969) A dynamic programming algorithm for cluster analysis. Oper Res 17(6):1034–1057
    • Johnston HC (1976) Cliques of a graph: variations on the Bron–Kerbosch algorithm. Int J Comput Inform Sci 5:209–238
    • Johnson EL, Mehrotra A, Nemhauser GL (1993) Min-cut clustering. Math Programm 62(1):133–151
    • Jun W, Chu-Min Li L, Jiang JZ, Yin M (2020) Local search for diversified Top-k clique search problem. Comput Oper Res 116:104867
    • Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations....
    • Klein G, Aronson JE (1991) Optimal clustering: a model and method. Naval Res Log 38(3):447–461
    • Kogan J, Nicholas C, Teboulle M (eds) (2006) Grouping multidimensional data: recent advances in clustering. Springer, Berlin
    • Koontz WLG, Narendra PM, Fukunaga K (1975) A branch and bound clustering algorithm. IEEE Trans Comput 100(9):908–915
    • Larrañaga P, Lozano JA, Peña JM (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn Lett...
    • Liberti L, Manca B (2022) Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections. J Global Optim...
    • Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451– 461
    • Loyd S (1982) Least squares quantization in PCM. IEEE Trans Inform Theory 28(2):129–137
    • MacQueen JB (1976) Some methods for classification and analysis of multivariate observations. Proc Fifth Berkel Symp Math Stat Probab 5(1):281–297
    • Merle OD, Hansen P, Jaumard B, Mladenovi´c N (1999) An interior point algorithm for minimum sum-ofsquares clustering. SIAM J Sci Comput 21(4):1485–1505
    • Michael J (2005) Brusco and stephanie stahl. In: Branch-and-Bound Applications in Combinatorial Data Analysis. Springer, New York
    • Pardalos PM, Rodgers GP (1992) A branch and bound algorithm for the maximum clique problem. Comput Oper Res 19(5):363–375
    • Peng J, Xia Y (2005) A cutting plane algorithm for the minimum sum-of-squared error clustering. In: Proceedings of the 2005 SIAM International...
    • Piccialli V, Sudoso AM (2023) Global optimization for cardinality-constrained minimum sum-of-squares clustering via semidefinite programming....
    • Piccialli V, Russo AR, Sudoso AM (2022a) An exact algorithm for semi-supervised minimum sum-ofsquares clustering. Comput Oper Res 147(C)
    • Piccialli V, Sudoso AM, Wiegele A (2022) SOS-SDP: an exact solver for minimum sum-of-squares clustering. INFORMS J Comput 34(4):2144–2162 Rao...
    • Sherali HD, Desai J (2005) A global optimization RLT-based approach for solving the hard clustering problem. J Global Optim 32(2):281–306
    • Steinley D (2003) Local optima in K-means clustering: what you don’t know may hurt you. Psychol Methods 8(3):294–304
    • van Os BJ, Meulman JJ (2004) Improving dynamic programming strategies for partitioning. J Classif 21(2):207–230
    • Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
    • West Douglas B (2001) Introduction to graph theory. Prentice Hall, Upper Saddle River
    • Zhong C, Miao D, Wang R (2010) A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recogn 43(3):752–766

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno