Algoritmos de clasificación de documentos científicos: pasado y presente

Jesús María Álvarez Llorente; Vicente Pablo Guerrero Bote; Félix de Moya-Anegón

Ayuda

Algoritmos de clasificación de documentos científicos: pasado y presente

Álvarez-Llorente, Jesús M. ^[1] ; Guerrero-Bote, Vicente P. ^[1] ; De-Moya-Anegón, Félix ^[2]
1. [1] Universidad de Extremadura
  
  Universidad de Extremadura
  
  Badajoz, España
2. [2] SCImago Research Group
Localización: Infonomy, ISSN-e 2990-2290, Vol. 3, Nº. 4, 2025 (Ejemplar dedicado a: Algoritmos)
Idioma: español
DOI: 10.3145/infonomy.25.026
Títulos paralelos:
- Algorithms for Scientific Documents: Past and Present
Enlaces
- Texto completo
Resumen
- español
  Este trabajo se presenta como una recopilación de algoritmos de clasificación de la investigación a nivel de artículo como alternativa a las clasificaciones por revistas que se emplean en las grandes bases de datos de ciencia como Web of Science o Scopus, las cuales causan gran imprecisión en las búsquedas y en la evaluación de la ciencia, ya que utilizando éstas, los artículos no resultan categorizados con fidelidad respecto a su verdadero contenido. En primer lugar hacemos una revisión histórica de las principales ideas planteadas a lo largo de los años desde la misma aparición de las bases de datos, detectando sus contribuciones y sus limitaciones. Los algoritmos de agrupamiento automático y de detección de comunidades han supuesto grandes avances en organización de la ciencia, pero no resultan aplicables como alternativa a la clasificación por revistas. Otros algoritmos no son escalables al conjunto de la ciencia debido a su complejidad, como los basados en redes neuronales o minería de textos. Las propuestas más recientes y prometedoras responden a algoritmos sencillos que, partiendo de la categorización por revistas, reclasifican los artículos en las mismas jerarquías temáticas de las bases de datos, mediante el análisis de simples citas y referencias.
- English
  This study offers a comprehensive overview of document-level classification algorithms in scientific research, proposed as an alternative to the journal-based categorizations employed by major bibliographic databases such as Web of Science and Scopus. These journal-driven schemes often introduce significant inaccuracies in both information retrieval and research evaluation, as they fail to categorize articles in accordance with their actual content.
  
  First, we provide a historical review of the main approaches developed since the emergence of scientific databases, highlighting their contributions as well as their limitations. Automatic clustering techniques and community detection algorithms have represented important advances in the organization of scientific knowledge, yet they cannot serve as a practical substitute for journal-based classifications. Other approaches, such as those relying on neural networks or text mining, face scalability issues that prevent their application at the global level of science.
  
  The most recent and promising strategies are built upon simple algorithms that, starting from existing journal categorizations, reclassify articles into the same thematic hierarchies used by bibliographic databases, relying primarily on the analysis of straightforward citation and reference patterns.
Referencias bibliográficas
- Althouse, B. M.; West, J. D.; Bergstrom, C.T.; Bergstrom, T. (2009). Differences in impact factor across fields and over time. Journal of...
- Álvarez-Llorente, J. M. (2025). Nuevos algoritmos de clasificación de documentos científicos individuales basados en referencias para mejorar...
- Álvarez-Llorente, J. M.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2024). New fractional classifications of papers based on two generations...
- Álvarez-Llorente, J. M.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2025). New paper-by-paper classification for Scopus based on references...
- Álvarez-Llorente, J. M.; Guerrero‐Bote, V. P.; De-Moya-Anegón, F. (2023). Creating a collection of publications categorized by their research...
- Andersen, J. P. (2023). Field-level differences in paper and author characteristics across all fields of science in Web of Science, 2000-2020....
- Blondel, V. D.; Guillaume, J. L.; Lambiotte, R.; Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical...
- Bornmann, L.; Leydesdorff, L. (2017). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis...
- Bornmann, L.; Tekles, A.; Leydesdorff, L. (2019). How well does I3 perform for impact measurement compared to other bibliometric indicators?...
- Boyack, K. W.; Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the...
- Boyack, K. W.; Klavans, R. (2020). A comparison of large-scale science models based on textual, direct citation and hybrid relatedness. Quantitative...
- Boyack, K. W.; Newman, D.; Duhon, R. J.; Klavans, R.; Patek, M.; Biberstine, J. R.; Schijvenaars, B.; Skupin, A.; Ma, N.; Börner, K. (2011)....
- Boyack, K. W.; Small, H.; Klavans, R. (2013). Improving the Accuracy of Co-citation Clustering Using Full Text. J Am Soc Inf Sci Tec, 64:...
- Chumachenko, A.; Kreminskyi, B.; Mosenkis, I.; Yakimenko, A. (2022). Dynamical entropic analysis of scientific concepts. Journal of Information...
- Clauset, A.; Newman, M.; Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6). https://doi.org/10.1103/physreve.70.066111
- De-Moya-Anegón, F.; Herrero-Solana, V.; Jiménez-Contreras, E. (2006). A connectionist and multivariate approach to science maps: the SOM,...
- Ding, J.; Ahlgren, P.; Yang, L.; Yue, T. (2018). Disciplinary structures in Nature, Science and PNAS: Journal and country levels. Scientometrics,...
- Eykens, J.; Guns, R.; Engels, T. C. E. (2019). Article level classification of publications in sociology: An experimental assessment of supervised...
- Fang, H. (2015). Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories. Knowledge Organization, 42(3),...
- https://doi.org/10.5771/0943-7444-2015-3-139
- Glänzel, W.; Schubert, A.; Czerwon, H. (1999a). An item-by-item subject classification of papers published in multidisciplinary and general...
- Glänzel, W.; Schubert, A.; Schoepflin, U.; Czerwon, H. (1999b). An item-by-item subject classification of papers published in journals covered...
- Glänzel, W.; Thijs, B.; Chi, PS. (2016). The challenges to expand bibliometric studies from periodical literature to monographic literature...
- Glänzel, W.; Thijs, B.; Huang, Y. (2021). Improving the precision of subject assignment for disparity measurement in studies of interdisciplinary...
- Gläser, J.; Glänzel, W.; Scharnhorst, A. (2017). Same data—Different results? Towards a comparative approach to the identification of thematic...
- Glenisson, P.; Glänzel, W.; Janssens, F.; De-Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines....
- Gómez-Crisóstomo, M. R. (2011). Study and comparison of the Web of Science and Scopus (1996-2007) [Doctoral thesis, University of Extremadura]....
- Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2012). A further step forward in measuring journals’ scientific prestige: The SJR2 indicator. Journal...
- Guerrero-Bote, V. P.; Zapico-Alonso, F.; Espinosa-Calvo, M. E.; Gómez-Crisóstomo, R.; De-Moya-Anegón, F. (2007). Import-export of knowledge...
- Guerrero-Bote, V.P.; De-Moya-Anegón, F.; Herrero-Solana, V. (2002). Document organization using Kohonen’s algorithm. Information Processing...
- Hassan-Montero, Y.; De-Moya-Anegón, F.; Guerrero-Bote, V. P. (2022). SCImago Graphica: a new tool for exploring and visually communicating...
- Hassan-Montero, Y.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2014). Graphical interface of the SCImago Journal and Country Rank: an interactive...
- Huang, Y.; Glänzel, W.; Thijs, B.; Porter, A. L.; Zhang, L. (2021). The comparison of various similarity measurement approaches on interdisciplinary...
- Janssens, F.; Leta, J.; Glänzel, W.; De-Moor, B. (2006). Towards mapping library and information science. Information Processing & Management,...
- Janssens, F.; Glänzel, W.; De-Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631. https://doi.org/10.1007/s11192-007-2002-7
- Janssens, F.; Zhang, L.; De-Moor, B.; Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes....
- Javitz, H.; Grimes, T.; Hill, D.; Rapoport, A.; Bell, R.; Fecso, R.; Lehming, R. (2010). U.S. Academic Scientific Publishing. Working paper...
- Kandimalla, B.; Rohatgi, S.; Wu, J.; Giles, C. L. (2021). Large scale subject category classification of scholarly papers with deep attentive...
- Klavans, R.; Boyack, K. W. (2005). Identifying a better measure of relatedness for mapping science. Journal of the Association for Information...
- Klavans, R.; Boyack, K. W. (2006). Quantitative evaluation of large maps of science. Scientometrics, 68, 475–499. https://doi.org/10.1007/s11192-006-0125-x
- Klavans, R.; Boyack, K. W. (2016). Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge?...
- Lai, K.; Wu, S. (2005). Using the patent co-citation approach to establish a new patent classification system. Information Processing &...
- Lancho-Barrantes, B. S.; Guerrero-Bote, V. P.; De-Moya Anegón, F. (2010b). What lies behind the averages and significance of citation indicators...
- Lancho-Barrantes, B. S.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2010a). The iceberg hypothesis revisited. Scientometrics, 85(2), 443–461....
- Leydesdorff, L.; De-Moya‐Anegón, F.; Guerrero‐Bote, V. P. (2010). Journal maps on the basis of Scopus data: A comparison with the Journal...
- Leydesdorff, L.; De-Moya‐Anegón, F.; Guerrero‐Bote, V. P. (2015). Journal maps, interactive overlays, and the measurement of interdisciplinarity...
- Li, K.; Chen, P.-Y.; Fang, Z. (2019). Disciplinarity of software papers: A preliminary analysis. Proceedings of the Association for Information...
- Marshakova-Shaikevich, I. (2005). Bibliometric maps of field of science. Information Processing & Management, 41(6), 1534–1547. https://doi.org/10.1016/j.ipm.2005.03.027
- McGillivray, B.; Astell, M. (2019). The relationship between usage and citations in an open access mega-journal. Scientometrics, 121, 817–838....
- Milojević, S. (2020). Practical method to reclassify Web of Science articles into unique subject categories and broad disciplines. Quantitative...
- Opthof, T.; Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance....
- Peña-Rocha, M.; Gómez-Crisóstomo, R.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2025). Bibliometrics effects of a new paper level classification....
- Rees-Potter, L. K. (1989). Dynamic thesaural systems: A bibliometric study of terminological and conceptual change in sociology and economics...
- Sachini, E.; Sioumalas-Christodoulou, K.; Christopoulos, S.; Karampekios, N. (2022) AI for AI: Using AI methods for classifying AI science...
- Schildt, H.; Mattsson, J. (2006). A dense network sub-grouping algorithm for co-citation analysis and its implementation in the software tool...
- Shu, F.; Julien, C.; Zhang, L.; Qiu, J.; Zhang, J.; Larivière, V. (2019). Comparing journal and paper level classifications of science. Journal...
- Šubelj, L.; Van Eck, N. J.; Waltman, L. (2016). Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of...
- Thelwall, M.; Pinfield, S. (2024). The accuracy of field classifications for journals in Scopus. Scientometrics, 129(2), 1097–1117. https://doi.org/10.1007/s11192-023-04901-4
- Thijs, B.; Huang, Y.; Glänzel, W. (2021). Comparing different implementations of similarity for disparity and variety measures in studies...
- Van Eck, N.J.; Waltman, L. (2010). Software survey: VOSviewer, acomputer program for bibliometric mapping. Scientometrics, 84(2), 523–538....
- Waltman, L.; Van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the...
- Waltman, L.; Boyack, K. W.; Colavizza, G.; Van Eck, N. J. (2020). A principled methodology for comparing relatedness measures for clustering...
- Wang, Q.; Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal...
- Zhang, J.; Shen Z. (2024). Analyzing journal category assignment using a paper-level classification system: multidisciplinary sciences journals....
- Zhang, L.; Janssens, F.; Liang, L.; Glänzel W. (2010). Journal cross-citation analysis for validation and improvement of journal-based subject...
- Zhang, L.; Rousseau, R.; Glänzel, W. (2016). Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity...
- Zhang, L.; Sun, B.; Shu, F.; Huang, Y. (2022). Comparing paper level classifications across different methods and systems: an investigation...