Comparison of Clustering Algorithms for Knowledge Discovery in Social Media Publications: A Case Study of Mental Health Analysis

Manuel Couto; Javier Parapar; David Enrique Losada Carril

Ayuda

Comparison of Clustering Algorithms for Knowledge Discovery in Social Media Publications: A Case Study of Mental Health Analysis

Autores: Manuel Couto, Javier Parapar , David Enrique Losada Carril
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 73, 2024, págs. 69-81
Idioma: inglés
Títulos paralelos:
- Comparación de algoritmos de agrupamiento para el descubrimiento de conocimiento en publicaciones de redes sociales: un caso de estudio en salud mental
Enlaces
- Texto completo
Resumen
- español
  En la era de las redes sociales, el contenido generado por los usuarios es fundamental para detectar los primeros signos de trastornos mentales. En este estudio utilizamos el agrupamiento de publicaciones por tópicos para analizar el contenido de la plataforma Reddit. Nuestro objetivo primordial es utilizar técnicas de agrupamiento para descubrir temas centrales, con un enfoque en la identificación de temas comunes entre los grupos de usuarios que sufren enfermedades mentales como la depresión, la anorexia, la adicción a los juegos de azar y las autolesiones. Nuestros hallazgos muestran que ciertos clusters son más cohesivos, por ejemplo mostrando una mayor proporción de textos de personas con depresión. Además, hemos descubierto subreddits que están fuertemente vinculados a textos escritos por usuarios deprimidos. Estos hallazgos arrojan luz sobre cómo las interacciones en línea y los temas que se tratan en los subreddits reflejan aspectos de salud mental, abriendo el camino para futuras investigaciones e intervenciones dirigidas a la prevención de trastornos.
- English
  In the age of social media, user-generated content is critical for detecting early signs of mental disorders. In this study, we use thematic clustering to analyze the content of the social media platform Reddit. Our primary goal is to use clustering techniques for comprehensive topic discovery, with a focus on identifying common themes among user groups suffering from mental illnesses such as depression, anorexia, gambling addiction, and self-harm. Our findings show that certain clusters are more cohesive, e.g., with a higher proportion of texts indicating depression. Furthermore, we discovered subreddits that are strongly linked to texts from the depressed user group. These findings shed light on how online interactions and subreddit themes may impact users’ mental health, paving the way for future research and more targeted interventions in the field of online mental health.
Referencias bibliográficas
- Ankerst, M., M. M. Breunig, H.-P. Kriegel, and J. Sander. 1999. Optics: Ordering points to identify the clustering structure. ACM Sigmod record,...
- Aragon, M. E., A. P. Lopez-Monroy, L.-C. G. Gonzalez-Gurrola, and M. Montes. 2021. Detecting mental disorders in social media through emotional...
- Aragón, M. E., A. P. López-Monroy, and M. Montes-y Gómez. 2019. Inaoe-cimat at erisk 2019: Detecting signs of anorexia using fine-grained...
- Arthur, D. and S. Vassilvitskii. 2006. kmeans++: The advantages of careful seeding. Technical report, Stanford.
- Bird, S., E. Klein, and E. Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly...
- Bolla, M. 2013. Spectral clustering and biclustering: Learning large graphs and contingency tables. John Wiley & Sons.
- Calinski, T. and J. Harabasz. 1974. A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1):1–27.
- Chancellor, S. and M. De Choudhury. 2020. Methods in predictive techniques for mental health status on social media: a critical review. NPJ...
- Clatworthy, J., D. Buick, M. Hankins, J.Weinman, and R. Horne. 2005. The use and reporting of cluster analysis in health psychology: A review....
- Couto, M., A. Pérez, and J. Parapar. 2022. Temporal word embeddings for early detection of signs of depression. In Proceedings of the CIRCLE...
- Crestani, F., D. E. Losada, and J. Parapar. 2022. Early Detection of Mental Health Disorders by Social Media Monitoring: The First Five Years...
- Croft, W. B., D. Metzler, and T. Strohman. 2010. Search engines: Information retrieval in practice, volume 520. Addison-Wesley Reading.
- Davies, D. L. and D. W. Bouldin. 1979. A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2):224–227.
- Day, W. H. and H. Edelsbrunner. 1984. Efficient algorithms for agglomerative hierarchical clustering methods. Journal of classification, 1(1):7–24.
- Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical...
- Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding....
- Ding, C. and X. He. 2004. K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference...
- Emmons, S., S. Kobourov, M. Gallant, and K. B¨orner. 2016. Analysis of network clustering algorithms and cluster quality metrics at scale....
- Ester, M., H.-P. Kriegel, J. Sander, X. Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with...
- Ezugwu, A. E., A. M. Ikotun, O. O. Oyelade, L. Abualigah, J. O. Agushaka, C. I. Eke, and A. A. Akinyelu. 2022. A comprehensive survey of clustering...
- Fahad, A., N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras. 2014. A survey of clustering algorithms for...
- Frey, B. J. and D. Dueck. 2007. Clustering by passing messages between data points. science, 315(5814):972–976.
- Gao, C. X., D. Dwyer, Y. Zhu, C. L. Smith, L. Du, K. M. Filia, J. Bayer, J. M. Menssink, T. Wang, C. Bergmeir, et al. 2023. An overview of...
- Ghaharian, K., B. Abarbanel, D. Phung, P. Puranik, S. Kraus, A. Feldman, and B. Bernhard. 2022. Applications of data science for responsible...
- Hubert, L. and P. Arabie. 1985. Comparing partitions. Journal of classification, 2:193–218.
- Ikeda, K., G. Hattori, C. Ono, H. Asoh, and T. Higashino. 2013. Twitter user profiling based on text and community mining for market analysis....
- Kadhim, A. I., Y.-N. Cheah, and N. H. Ahamed. 2014. Text document preprocessing and dimension reduction techniques for text document clustering....
- Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. Roberta: A robustly optimized...
- Losada, D. E., F. Crestani, and J. Parapar. 2017. erisk 2017: Clef lab on early risk prediction on the internet: experimental foundations....
- MacQueen, J. 1967. Classification and analysis of multivariate observations. In 5th Berkeley Symp. Math. Statist. Probability, pages 281–297....
- Mahdi, M. A., K. M. Hosny, and I. Elhenawy. 2021. Scalable clustering algorithms for big data: A review. IEEE Access, 9:80015–80027.
- Marutho, D., S. H. Handaka, E. Wijaya, et al. 2018. The determination of cluster number at k-mean using elbow method and purity evaluation...
- Murtagh, F. and P. Contreras. 2012. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and...
- Nguyen, T., A. Yates, A. Zirikly, B. Desmet, and A. Cohan. 2022. Improving the generalizability of depression detection by leveraging clinical...
- Nielsen, F. and F. Nielsen. 2016. Hierarchical clustering. Introduction to HPC with MPI for Data Science, pages 195–211.
- Palacio-Niño, J.-O. and F. Berzal. 2019. Evaluation metrics for unsupervised learning algorithms. arXiv preprint ar-Xiv:1905.05667.
- Parapar, J., P. Martín-Rodilla, D. E. Losada, and F. Crestani. 2022. erisk 2022: pathological gambling, depression, and eating disorder challenges....
- Peres, F., E. Fallacara, L. Manzoni, M. Castelli, A. Popovic, M. Rodrigues, and P. Estevens. 2021. Time series clustering of online gambling...
- Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog,...
- Rand, W. M. 1971. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846–850.
- Reynolds, D. A. 2009. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663).
- Ríssola, E. A., D. E. Losada, and F. Crestani. 2021. A survey of computational methods for online mental state assessment on social media....
- Rousseeuw, P. J. 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and...
- Shensa, A., J. E. Sidani, M. A. Dew, C. G. Escobar-Viera, and B. A. Primack. 2018. Social media use and depression and anxiety symptoms: A...
- Strehl, A. and J. Ghosh. 2002. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning...
- Völske, M., M. Potthast, S. Syed, and B. Stein. 2017. Tl; dr: Mining reddit to learn automatic summarization. In Proceedings of the Workshop...
- Yazdavar, A. H., H. S. Al-Olimat, M. Ebrahimi, G. Bajaj, T. Banerjee, K. Thirunarayan, J. Pathak, and A. Sheth. 2017. Semisupervised approach...
- Zhang, T., R. Ramakrishnan, and M. Livny. 1996. Birch: an efficient data clustering method for very large databases. ACM sigmod record, 25(2):103–114.