Creación y uso de una ontología relacionada con genes, síndromes, síntomas y enfermedades para la clasificación de textos biomédicos

María de la Concepción Pérez de Celis; Gerardo Sierra; Fátima Ronquillo; Emilio Salceda

Ayuda

Creación y uso de una ontología relacionada con genes, síndromes, síntomas y enfermedades para la clasificación de textos biomédicos

Concepción Pérez de Celis ^[1] ; Gerardo Sierra ^[2] ; Fátima Ronquillo ^[1] ; Emilio Salceda ^[1]
1. [1] Benemérita Universidad Autónoma de Puebla
  
  Benemérita Universidad Autónoma de Puebla
  
  México
2. [2] Universidad Nacional Autónoma de México
  
  Universidad Nacional Autónoma de México
  
  México
Localización: Revista signos: estudios de lingüística, ISSN-e 0718-0934, ISSN 0035-0451, Nº. 84, 2014, págs. 91-112
Idioma: español
DOI: 10.4067/s0718-09342014000100005
Títulos paralelos:
- Integrating ontologies and supervised methods in the multi-classification of biomedical documents
Enlaces
- Texto completo (pdf)
Resumen
- español
  Esta investigación tiene como objetivo analizar y clasificar artículos biomédicos en el ámbito de neurociencias y, en particular, se consideran artículos científicos relacionados con hipoacusia. El proceso de categorización de textos generalmente consta de dos etapas: la primera, consistente en la delimitación de las clases que dividen al tema de nuestro interés, y la segunda, enfocada a la categorización de los textos de interés. En la mayoría de las aplicaciones, la categorización se resuelve basando el modelo en la obtención de clases que se encuentran dispersas, lo cual permite que los algoritmos de categorización existentes tengan buenos resultados dado que entre ellos hay una línea amplia de separación de las clases. El problema radica cuando la evaluación de las clases contiene una línea de separación estrecha entre ellas. En este trabajo se presenta un enfoque diferente al tradicional mediante la integración de dos algoritmos de categorización, el uso de n-gramas de letras para la categorización de clases parcialmente distantes y posteriormente la afinación de la categorización de documentos utilizando los términos de una ontología de dominio. Los resultados obtenidos con este método han sido prometedores
- English
  This study aims to analyze and categorize biomedical articles from the field of neuroscience, specifically, scientific articles related to hearing loss are considered. The text categorization process usually consists of two stages: the first one consists of the division of the classes that divide the object of study, and the second one is focused on the categorization of the texts which make up our corpus. In most applications, the categorization is solved by basing the models on the obtention of dispersed classes;
  
  this allows for existing algorithms of categorization to get good results because there are big lines of separation among the classes. But there are problems when these lines of separation are narrow. This paper presents a different approach by integrating two algorithms of categorization: using n-grams of letters for categorizing distant classes, and later refining the categorization of documents partially, using the terms of a domain ontology related with genes, diseases and syndromes. Promising results were obtained with this method
Referencias bibliográficas
- Aitkenhead, M. J. (2008). A co-evolving decision tree classification method. Expert Systems with Applications, 34(1), 18-25.
- Amari S., Murata, N., Müller, K. R., Finke, M. & Yang, H. (1997). Asymptotic statistical theory of overtraining and cross-validation....
- Berger, A., Della Pietra S. & Della Pietra, S. (1996). A maximum entropy approach to natural language processing, Computational Linguistics,...
- Betancourt, G. (2005). Las máquinas de soporte vectorial (SVMs). Scientia et Technica, 11(27), 67-72.
- Dayanik, A., Lewis, D., Madigan, D., Menkov, V. & Genkin, A. (2006). Constructing informative prior distributions from domain knowledge...
- Dragu, N., Elkhoury, F., Miyazaki, T., Morelli, R. & Tada, N. (2010). Ontolog y-based text mining for predicting disease outbreaks. En...
- Genetics Home Reference [en línea]. Disponible en: http://ghr.nlm.nih.gov/
- Gunn, S. (2003). Support vector machine for classification and regression. Informe técnico de la Universidad de Southampton, Inglaterra.
- Hall, M. (2009). The WEKA Data Mining Software: An Update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
- HUGO Gene Nomenclature Committee (HGNC) [en línea]. Disponible en: http:// www.genenames.org/
- Kohavi, R. (1996). Scaling up the precision of naive-bayes classifiers: A decision tree hybrid. Ponencia presentada en el Second International...
- Kononenko, I. (1991). Semi-naive bayesian classifier. Ponencia presentada en el European Working Sesion on Learning on Machine Learning, Porto,...
- Laza, R. & Pavón, R. (2010). Clasificador Bayesiano de Documentos MedLine a partir de Datos No Balanceados. España: Universidad de Vigo.
- Lewison, G. & Paraje, G. (2004). The classification of biomedical journals by research leve. Scientometrics, 60(2), 145-157.
- Maedche, A. & Staab, S. (2000). Mining ontologies from text. En R. Dieng & O. Corby (Eds.), EKAW 2000, LNAI 1937 (pp. 189-202). Berlin,...
- Melville, P., Gryc, W. & Lawrence, R. (2009). Sentiment analysis of blogs by combining lexical knowledge with text classification. Ponencia...
- Polat, K. & Günes, S. (2009). A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for...
- Protégé [en línea]. Disponible en: http://protege.stanford.edu/
- Ronquillo, F., Pérez de Celis, C., Sierra, G., da Cunha, I. & Torres-Moreno, J. (2011). Automatic classification of biomedical texts:...
- Salton, G. (1968). Automatic information organization and retrieval. Nueva York: McgrawHill.
- Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1),1-47.
- Spasic, I., Ananiadou, S., McNaught, J. & Kumar, A. (2005). Text mining and ontologies in biomedicine: Making sense of raw text. Briefings...
- Szarvas, G. (2008). Hedge classification in biomedical texts with a weakly supervised selection of keywords. Ponencia presentada en el 46th...
- Torres-Moreno, J., El-Beze, M., Bechet, F. & Camelin, N. (2007). Comment faire pour que l’opinion forgé_à la sortie des urnes soit la...
- Venegas, R. (2007). Clasificación de textos académicos en función de su contenido léxico-semántico. Revista Signos. Estudios de Lingüística,...
- Vens, C., Struyf, J., Schietgat, L., Dzeroski, S. & Blockeel, H. (2008). Decision trees for hierarchical multilabel classification. Machine...
- Zeng, D., Li, J., Wang, F. & Zuo, W. (2009). Sentiment analysis of Chinese documents: From sentence to document level. Journal of the...
- Zhang, X., Dong, G. & Ramamohanarao, K. (2000). Information-based classification by aggregating emerging pattern. Ponencia presentada...
- Zhang, C., Xue, G., Yu, Y. & Zha, H. (2009). Web-scale classification with naive bayes. Ponencia presentada en el 18th International Conference...
- Zhi-Hong, D., Tang, S.-W., Yang, D.-Q., Zhang, M., Wu, X. B. & Yang, M. (2002). Linear text classification algorithm based on category...