Análisis de medidas no-supervisadas de calidad en clusters obtenidos por K-means y Particle Swarm Optimization

Andrea Villagra; Ana Guzman; Daniel Raul Pandolfi; Guillermo N. Leguizamón

Ayuda

Análisis de medidas no-supervisadas de calidad en clusters obtenidos por K-means y Particle Swarm Optimization

Autores: Andrea Villagra, Ana Guzman, Daniel Raul Pandolfi, Guillermo N. Leguizamón
Localización: Ciencia y tecnología, ISSN 1850-0870, ISSN-e 2344-9217, Nº. 9, 2009
Idioma: español
DOI: 10.18682/cyt.v1i1.782
Enlaces
- Texto completo
Resumen
- español
  El clustering de datos ayuda a discernir la estructura y simplifica la complejidad de cantidades masivas de datos. Esuna técnica común y se utiliza en diversos campos como, aprendizaje de máquina, minería de datos, reconocimientode patrones, an´alisis de im´agenes y bioinform´atica, donde la distribución de la informaci´on puede ser de cualquiertama˜no y forma. La eficiencia de los algoritmos de clustering es extremadamente necesaria cuando se trabaja conenormes bases de datos y tipos de datos de grandes dimensiones. Este trabajo presenta una evaluaci´on desde distintasperspectivas de una serie de medidas relevantes no-supervisadas de calidad como por ejemplo, cuantizaci´on del error,distancias intra- e inter- cluster, de los clusters obtenidos por el conocido algoritmo de K-means, una metaheur´ısticapoblacional denominada Particle Swarm Optimization (PSO) y un algoritmo h´ıbrido, que combina las caracter´ısticasde los dos algoritmos anteriores, denominado PSO+Kmeans. De los resultados obtenidos se observa que en general elalgoritmo PSO+K-means obtiene mejores resultados en cada una de las medidas generando clusters m´as compactos yseparados entre ellos que los obtenidos por los otros algoritmos.
- English
  Data clustering helps in discerning the structure and simplifing the complexity of massive quantities of data. It is a common technique used in many fields, including machine learning, data mining, image analysis, and bioinformatics, in which the distribution of information can be of any size and shape. The efficiency of clustering algorithms is strongly required with very large databases and high-dimensional data types. This paper presents an evaluation study, from different perspectives, of several important unsupervised quality measures including quantization error, intra- and inter-cluster distances, obtained by the well-known K-means algorithm and a population-based metaheuristic called Particle Swarm Optimization (PSO) and a hybrid algorithm that combines the characteristics of both algorithms, called PSO+K-means. Results show that in general the PSO+K-means algorithm obtains better results in each measure and generates higher compact and separates clustering than either PSO or K-means alone.
Referencias bibliográficas
- Jain A.K., Murty M. N., and Flynn P. J. Data clustering: A review. ACM Computing Survey, 31(3):264–323,
- Jain A.K. and Dubes R.C. Algorithms for Clustering Data. Englewood Cliffs, N.J.:Prentice Hall, 1988.
- Chui-Yu Chui, Yi-Feng Chen, I-Ting Kou, and He Chun Ku. An intelligent market segmentation system using
- k-means and particle swarm optimization. Expert Systems with Applications, 2008.
- Kennedy J. and Eberhart R. Swarm Intelligence. Morgan Kaufmann, San Francisco, California, 2001.
- MacQueen J. Some methods for classification and analysis of multivariate observations. In Proceedings of the
- th Berkeley Symp. Math. Statist, Prob, pages 281–297, 1968.
- Omran M., Salman A., and Engelbrecht A.P. Image classification using particle swarm optimization. In Conference
- on Simulated Evolution and Learning, Computational intelligence for the E-age, 2002.
- Tou J. T. and Gonzalez R. C. Pattern recognition principles. Addison-Wesley, 1974.
- Fayyad U., Piatetsky-Shapiro G., and Smith P. From data mining to knowledge discovery in database. In
- American Association for Artificial Intelligence, pages 37–54, 1996.
- Backer U.E. Computer-assisted reasoning in cluster analysis. Prentice-Hall, 1995.
- Huang T. W. Application of clustering analysis for reducing smt setup time- a case study on avantech company.
- Master’s thesis, Department of National Taipei University of Technology, 2006.
- Selim S. Z. and Ismail M.A. K-means type algorithms: A generalized convergence theorem and characterization
- of local optimality. IEEE Trans. Pattern Anal. Mach. Intell., (6):81–87, 1984.