Ir al contenido

Documat


Modelamiento de tópicos para identificar patrones en la investigación cientíifica del Covid-19

  • Carolina Luquea [1] ; Juan Rubriche [1] ; Jhon Galvis [1] ; Juan Sosa [1]
    1. [1] Universidad Santo Tomas, Colombia
  • Localización: Comunicaciones en Estadística, ISSN 2027-3355, ISSN-e 2339-3076, Vol. 14, Nº. 2, 2021, págs. 48-66
  • Idioma: español
  • DOI: 10.15332/23393076.7705
  • Títulos paralelos:
    • Topic modeling to identify patterns in Covid-19 scientific research
  • Enlaces
  • Resumen
    • español

      Presentamos un modelo de t ́opicos basado en el m ́etodo asignaci ́on latente de Di-richlet (LDA, por sus siglas en ingl ́es) con el objetivo de examinar patrones en lainvestigaci ́on cient ́ıfica del Covid–19 teniendo en cuenta las publicaciones indexa-das en la base datos especializada PubMed. Se toman 4928 resumenes cient ́ıficospublicados durante el primer semestre de 2020. Se ajusta un modelo LDA utili-zando dos t ́opicos. El primer t ́opico corresponde a factores de riesgo, severidad ymortalidad por infecci ́on viral, mientras que el segundo al impacto de las infeccio-nes respiratorias en la salud p ́ublica. La clasificaci ́on propuesta brinda una visi ́onglobal sobre las dos tendencias de investigaci ́on presentes a la fecha en la que elan ́alisis tiene lugar. Adicionalmente, los resultados se ̃nalan que la aplicaci ́on de lametodolog ́ıa propuesta provee un camino para direccionar y hacer m ́as eficiente larevisi ́on bibliogr ́afica en el contexto acad ́emico.

    • English

      We consider a topic modeling approach using latent Dirichlet allocation (LDA)methods aiming to examine patterns in the scientific research of Covid-19 usingpublications indexed in the PubMed database. A total of 4928 scientific abstractspublished during the first semester of 2020 are taken into account. An LDA mo-del is fitted using two topics. The first topic corresponds to risk factors, severity, and mortality due to viral infection, whereas the second is the impact of respi-ratory illnesses on public health. Our classification provides a global overview ofthese two research trends from the moment the analysis takes place. Additionally,our findings suggest that the systematic application of the proposed methodologyprovides a way to address and make more efficient the bibliographic review inacademic contexts.

  • Referencias bibliográficas
    • Citas Aizawa, A. (2003). An information-theoretic perspective of tf–idf measures. Information Processing & Management, 39(1):45–65.
    • Älgå, A., Eriksson, O., and Nordberg, M. (2020). Analysis of scientific publications during the early phase of the covid-19 pandemic: topic...
    • Ashihara, K., El Vaigh, C. B., Chu, C., Renoust, B., Okubo, N., Takemura, N., Nakashima, Y., and Nagahara, H. (2020). Improving topic modeling...
    • Barry, A. E., Valdez, D., Padon, A. A., and Russell, A. M. (2018). Alcohol advertising on twitterâa topic model. American Journal of Health...
    • Bastani, K., Namavari, H., and Shaffer, J. (2019). Latent dirichlet allocation (lda) for topic modeling of the cfpb consumer complaints. Expert...
    • Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4):77–84.
    • Blei, D. M. and Lafferty, J. D. (2007). A correlated topic model of science. The annals of applied statistics, 1(1):17–35.
    • Blei, D. M. and Lafferty, J. D. (2009). Topic models. In Text mining, pages 101–124. Chapman and Hall/CRC.
    • Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3:993–1022.
    • Buntine, W. (2009). Estimating likelihoods for topic models. In Asian Conference on Machine Learning, pages 51–64. Springer.
    • Chen, L., Hossain, K. T., Butler, P., Ramakrishnan, N., and Prakash, B. A. (2016).
    • Syndromic surveillance of flu on twitter using weakly supervised temporal topic models. Data mining and knowledge discovery, 30(3):681–710.
    • Darling, W. M. (2011). A theoretical and practical implementation tutorial on topic modeling and gibbs sampling. In Proceedings of the 49th...
    • DiMaggio, P., Nag, M., and Blei, D. (2013). Exploiting affinities between topic modeling and the sociological perspective on culture: Application...
    • Dumais, S. T. (2004). Latent semantic analysis. Annual review of information science and technology, 38(1):188–230.
    • Fantini, D. (2017). easypubmed: An r package for search and retrieve scientific publication records from pubmed. Technical report.
    • Feinerer, I. (2013). Introduction to the tm package text mining in r. Technical report.
    • Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1):5228–5235.
    • Grün, B. and Hornik, K. (2011). topicmodels: An r package for fitting topic models.
    • Journal of statistical software, 40(1):1–30.
    • Grün, B., Hornik, K., and Grun, M. B. (2019). Package âtopicmodelsâ.
    • Gulo, C. A. and R ́ubio, T. R. (2015). Text mining scientific articles using the r. In Doctoral Symposium in Informatics Engineering.
    • Ho, T. and Thanh, T. D. (2021). Discovering community interests approach to topic model with time factor and clustering methods. Journal of...
    • Jain, E. G. (2021). A comparative analyzing of sms spam using topic models.
    • In Innovations in Information and Communication Technologies (IICT-2020), pages 91–99. Springer.
    • Kim, S.-H., Lee, N., and King, P. E. (2020). Dimensions of religion and spirituality: A longitudinal topic modeling approach. Journal for...
    • Kumar, A. and Paul, A. (2016). Mastering text mining with R. Packt Publishing Ltd.
    • McCallum, A., Corrada-Emmanuel, A., and Wang, X. (2005). Topic and role discovery in social networks.
    • Ovádek, M., Dyevre, A., and Wigard, K. (2021). Analysing eu treaty-making and litigation with network analysis and natural language processing....
    • Pham, Q., Stanojevic, M., and Obradovic, Z. (2020). Extracting entities and topics from news and connecting criminal records. arXiv preprint...
    • Porter, M. F. (2006). An algorithm for suffix stripping. Program.
    • Qaiser, S. and Ali, R. (2018). Text mining: use of tf-idf to examine the relevance of words to documents. International Journal of Computer...
    • Richardson, G. M., Bowers, J., Woodill, A. J., Barr, J. R., Gawron, J. M., and Levine, R. A. (2014). Topic models: A tutorial with r. International...
    • Silge, J. and Robinson, D. (2017). Text mining with R: A tidy approach. O’Reilly Media, Inc.”.
    • Srivastava, A. N. and Sahami, M. (2009). Text mining: Classification, clustering, and applications. CRC press.
    • Tian, Y. (2021). A multilayer correlated topic model. arXiv preprint arXiv:2101.02028.
    • Trueba-Gómez, R. and Estrada-Lorenzo, J.-M. (2010). La base de datos pubmed y la búsqueda de información científica. Seminarios de la Fundación...
    • Valdez, D., Picket, A. C., Young, B.-R., and Golden, S. (2021). On mining words: The utility of topic models in health education research...
    • Wainwright, M. J. and Jordan, M. I. (2008). Introduction to variational methods for graphical models. Foundations and Trends in Machine Learning,...
    • Wallach, H. M. (2006). Topic modeling: beyond bag-of-words. In Proceedings of the 23rd international conference on Machine learning, pages...
    • Wang, H., Ding, Y., Tang, J., Dong, X., He, B., Qiu, J., and Wild, D. J. (2011).Finding complex biological relationships in recent pubmed...
    • PloS one, 6(3):e17243.

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno