A framework for dissimilarity-based partitioning clustering of categorical time series

Autores: Manuel García-Magariños, José Vilar
Localización: Data mining and knowledge discovery, ISSN 1384-5810, Vol. 29, Nº 2, 2015, págs. 466-502
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- A new framework for clustering categorical time series is proposed. In our approach, a dissimilarity-based partitioning method is considered. We suggest measuring the dissimilarity between two categorical time series by assessing both closeness of raw categorical values and proximity between dynamic behaviours. For the latter, a particular index computing the temporal correlation for categorical-valued sequences is introduced. The dissimilarity measure is then used to perform clustering by considering a modified version of the $$k$$ k -modes algorithm specifically designed to provide with a better characterization of the clusters. Furthermore, the problem of determining the number of clusters in this framework is analyzed by comparing a range of procedures, including a prediction-based resampling method properly adjusted to deal with our dissimilarity. Several graphical devices to interpret and visualize the temporal pattern of each cluster are also provided. Performance of this clustering methodology is studied on different simulated scenarios and its effectiveness is concluded by comparison with alternative approaches. Real data use is illustrated by analyzing navigation patterns of users visiting a specific news web site.