Resumen de Research on incremental clustering algorithm for big data

Xiaojing Yang

As the scale of data becomes larger and larger, clustering processing, a key step in data mining, has important practicalsignificance. Aiming at the problems of time consumption and high clustering errors when the current clustering algorithmsdeal with massive and dynamic big data, an incremental clustering algorithm is proposed by taking big data as the researchobject. By exploring the attribute characteristics of big data, four characteristics such as scale, diversity, high speed andvalue are summarised. For large-scale data streams that have multiple attributes and are acquired one by one, optimisethe setting method of the K-means clustering algorithm category centre point, combine the K-means clustering algorithmand the Kalman filter algorithm and measure the distance between data point pairs. Instead of Mahalanobis distance,an incremental clustering algorithm suitable for big data is constructed. Five data sets are selected to carry out exampleanalysis. The results of the algorithm are verified by the algorithm. The proposed algorithm has obvious advantages in theincremental clustering effect of big data. At the same time, it also has efficient and stable computing performance, whichmeets the expected design requirements and goals

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: