Ir al contenido

Documat


Feature Selection for Microarray Gene Expression Data Using Simulated Annealing Guided by the Multivariate Joint Entropy

  • Félix Fernando González-Navarro [1] ; Lluís A. Belanche-Muñoz [2]
    1. [1] Universidad Autónoma de Baja California

      Universidad Autónoma de Baja California

      México

    2. [2] Universitat Politècnica de Catalunya

      Universitat Politècnica de Catalunya

      Barcelona, España

  • Localización: Computación y Sistemas (CyS), ISSN 1405-5546, ISSN-e 2007-9737, Vol. 18, Nº. 2, 2014, págs. 275-293
  • Idioma: inglés
  • DOI: 10.13053/CyS-18-2-2014-032
  • Títulos paralelos:
    • Selección de características para datos de expresión de los genes mediante microarreglos usando recocido simulado guiado por la entropía conjunta multivariada
  • Enlaces
  • Resumen
    • español

      La clasificación de microarreglos plantea muchos desafíos para el análisis de datos, dado que un conjunto de datos de expresión de genes puede contener docenas de observaciones con miles o incluso decenas de miles de genes. En este contexto, las técnicas de selección de subconjuntos de características pueden ser muy útiles para reducir el espacio de representación a uno manejable mediante técnicas de clasificación. En este trabajo se utiliza la entropía conjunta discretizada multivariada como base para la evaluación rápida de la relevancia de genes en el contexto de expresión génica mediante microarreglos. El algoritmo propuesto desarrolla una técnica de recocido simulado diseñada especialmente para la selección de subconjuntos de características, a través de la entropía conjunta. Esta es calculada incrementalmente, reutilizando los valores anteriores para calcular la relevancia de los subconjuntos de características. Esta combinación resulta ser una herramienta poderosa cuando se aplica a la maximización de la relevancia de un subconjunto de genes. Nuestro método ofrece soluciones altamente interpretables y más precisas que las propuestas por métodos competidores. El algoritmo propuesto es rápido, eficaz y no presenta parámetros críticos. Los resultados de los experimentos con varios conjuntos de datos de microarreglos de dominio público revelan alto rendimiento de clasificación y subconjuntos de pequeño tamaño, formados en su mayoría por genes biológicamente significativos. La técnica es general y podría ser utilizada en otros escenarios similares.

    • English

      Microarray classification poses many challenges for data analysis, given that a gene expression data set may consist of dozens of observations with thousands or even tens of thousands of genes. In this context, feature subset selection techniques can be very useful to reduce the representation space to one that is manageable by classification techniques. In this work we use the discretized multivariate joint entropy as the basis for a fast evaluation of gene relevance in a Microarray Gene Expression context. The proposed algorithm combines a simulated annealing schedule specially designed for feature subset selection with the incrementally computed joint entropy, reusing previous values to compute current feature subset relevance. This combination turns out to be a powerful tool when applied to the maximization of gene subset relevance. Our method delivers highly interpretable solutions that are more accurate than competing methods. The algorithm is fast, effective and has no critical parameters. The experimental results in several public-domain microarray data sets show a notoriously high classification performance and low size subsets, formed mostly by biologically meaningful genes. The technique is general and could be used in other similar scenarios.

  • Referencias bibliográficas
    • Akinmade, D.,Talukder, A.,Zhang, Y.,Luo, W.,Kumar, R.,Hamburger, A.. (2008). Phosphorylation of the erbb3 binding protein ebp1 by p21-activated...
    • Alon, U.,Barkai, N.,Notterman, D.,Gish, K.,Ybarra, S.,Mack, D.,Levine, A.. (1999). Broad patterns of gene expression revealed by clustering...
    • Bagheri-Yarmand, R.,Mandal, M.,Taludker, A.,Wang, R.,Vadlamudi, R.,Kung, H.,Kumar, R.. (2001). Etk/bmx tyrosine kinase activates pak1 and...
    • Bell, D.,Wang, H.. (2000). A formalism for relevance and its application in feature subset selection. Machine Learning. 41. 175-195
    • Bhattacharya, S.,Bunick, C.,Chazin, W.. (2004). Target selectivity in ef-hand calcium binding proteins. Biochimica et Biophysica Acta (BBA)...
    • Bowen, K.,Reimers, A.,Luman, S.,Kronz, J.,Fyffe, W.,Oxford, J.. (2008). Immunohistochemical localization of collagen type xi a1 and a2 chains...
    • Braga-Neto, U.,Dougherty, E. R.. (2003). Is cross-validation valid for small-sample microarray classification?. Bioinformatics. 20. 374-380
    • Bu, H.,Li, G.,Zeng, X.. (2007). Reducing error of tumor classification by using dimension reduction with feature selection. The First International...
    • Cai, R.,Hao, Z.,Yang, X.,Wen, W.. (2009). An efficient gene selection algorithm based on mutual information. Neurocomputing. 72. 991-999
    • Catlett, J.. (1991). On changing continuous attributes into ordered discrete attributes. Proceedings of the European working session on learning...
    • Chakraborty, S.. (2009). Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach....
    • Chang, C.,Lin, C.. (2002). Libsvm : a library for support vector machines.
    • Chu, F.,Wang, L.. (2005). Applications of support vector machines to cancer classification with microarray data. International Journal of...
    • Chu, W.,Ghahramani, Z.,Falciani, F.,Wild, D.. (2005). Biomarker discovery in microarray gene expression data with gaussian processes. Bioinformatics....
    • Delage, B.,Fennell, D.,Nicholson, L.,McNeish, I.,Lemoine, N.,Crook, T.,Szlosarek, P.. (2010). Arginine deprivation and argininosuccinate synthetase...
    • Duan, K.,Rajapakse, J.,Wang, H.,Azuaje, F.. (2005). Multiple svm-rfe for gene selection in cancer classification with expression data. IEEE/ACM...
    • Farhana, H.,Wahalab, K.,Adlercreutzc, H.,Cross, H.. (2002). Isoflavonoids inhibit catabolism of vitamin d in prostate cancer cells. Journal...
    • Filippone, M.,Masulli, F.,Rovetta, S.. (2006). Fuzzy Logic and Applications. Springer.
    • (2009). GenCards. Weizmann Institute of Science.
    • (2007). GeneAtlas University Rene Descartes - Paris.
    • Giatromanolaki, A.,Koukourakis, M.,Sivridis, E.,Turley, H.,Wykoff, C.,Gatter, K.,Harris, A.. (2003). Dec1 (stra13) protein expression relates...
    • Golub, T.,Slonim, D.,Tamayo, P.,Huard, C.,Gaasenbeek, M.,Mesirov, J.,Coller, H.,Loh, M.,Downing, J.,Caligiuri, M.,Bloomfield, C.,Lander, E.....
    • González, F.,Belanche, L.. (2008). A thermo-dynamical search algorithm for feature subset selection. Springer.
    • González, F. F.,Belanche, L. A.. (2011). Software Tools and Algorithms for Biological Systems. Springer. New York.
    • Gordon, G.,Jensen, R.,Hsiao, L.,Gullans, S.,Blumenstock, J.,Ramaswamy, S.,Richards, W.,Sugarbaker, D.,Bueno, R.. (2002). Translation of microarray...
    • Hewett, R.,Kijsanayothin, F.. (2008). Tumor classification ranking from microarray data. BMC Genomics. 9.
    • Hong, J.,Cho, S.. (2008). Cancer classification with incremental gene selection based on dna microarray data. IEEE Symposium on Computational...
    • Ishikura, H.,Ikeda, H.,Abe, H.,Ohkuri, T.,Hi-raga, H.,Isu, K.,Tsukahara, T.,Sato, N.,Kita-mura, H.,Iwasaki, N.,Takeda, N.,Nishimura, A. M....
    • Jong-Seok, M.,Won-Ji, J.,Jin-Hye, K.,Hyo-Jeong, K.,Mi-Jin, Y.,Jae-Woo, K.,Park, P. S. W.,Kyung-Sup, K.. (2011). Androgen stimulates glycolysis...
    • Kirkpatrick, S.. (1984). Optimization by simulated annealing: Quantitative studies. Journal of Statistical Physics. 34.
    • Kurgan, L.,Cios, K.. (2004). Caim discretization algorithm. IEEE Trans. on Knowledge and Data Engineering. 16. 145-153
    • Li, Y.,Liu, Y.. (2008). A wrapper feature selection method based on simulated annealing algorithm for prostate protein mass spectrometry data....
    • Lisboa, P.,Ellis, I.,Green, A.,Ambrogi, F.,Dias, M.. (2008). Cluster based visualisation with scatter matrices. Pattern Recognition Letters....
    • Lu, Y.,Han, J.. (2003). Cancer classification using gene expression data. Information Systems. 28. 243-268
    • Meiri, R.,Zahavi, J.. (2006). Using simulated annealing to optimize the feature selection problem in marketing applications. European Journal...
    • Metropolis, N.,Rosenbluth, A.,Rosenbluth, M.,Teller, A.,Teller, E.. (1953). Equations of state calculations by fast computing machines. Journal...
    • Munoz, I.,Rouse, J.. (2009). Control of histone methylation and genome stability by ptip. 10.
    • (2007). NCBI National Center of Biotechnology Information.
    • Ng, M.,Chan, L.. (2005). Informative gene discovery for cancer classification from microarray expression data. IEEE.
    • Potamias, G.,Koumakis, L.,Moustakis, V.. (2004). Gene selection via discretized geneexpression profiles and greedy feature-elimination. SETN....
    • Reeves, C. R.. (1995). Modern Heuristic Techniques for Combinatorial Problems. McGraw Hill.
    • Renata, R.,Visser, L.,der Leij, J. V.,Harms, G.,Blokzijl, T.,Deloulme, J.,van der Vlies, P.,Kamps, W.,Kok, K.,Lim, M.,Poppema, S.,van den...
    • Ruiz, R.,Riquelme, J.,Aguilar, J.. (2006). Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern...
    • Scherz-Shouval, R.,Shvets, E.,Fass, E.,Shorer, H.,Gil, L.,Elazar, Z.. (2007). Reactive oxygen species are essential for autophagy and specifically...
    • Shah, S.,Kusiak, A.. (2007). Cancer gene search with data-mining and genetic algorithms. Comput. Biol. Med. 37. 251-261
    • Shaik, J.,Yeasin, M.. (2007). A unified framework for finding differentially expressed genes from microarray experiments. BMC Bioinformatics....
    • Shannon, C. E.. (1948). A mathematical theory of communication. The Bell System Technical Journal. 27. 379-423
    • Sheng, Z.,Wang, J.,Dong, Y.,Ma, H.,Zhou, H.,Sugimura, H.,Lu, G.,Zhou, X.. (2008). Ephb1 is underexpressed in poorly differentiated colorectal...
    • Singh, D.,Febbo, P.,Ross, K.,Jackson, D.,Manola, J.,Ladd, C.,Tamayo, P.,Renshaw, A.,D'Amico, A.,Richie, J.,Lander, E.,Loda, M.,Kantoff,...
    • Starza, R. L.,Crescenzi, B.,Pierini, V.,Romoli, S.,Gorello, P.,Brandimarte, L.,Matteucci, C.,Kropp, M.,Barba, G.,Martelli, M.,Mecucci, C.....
    • Tang, Y.,Zhang, Y.,Huang, Z.. (2007). Development of two-stage svm-rfe gene selection strategy for microarray expression data analysis. IEEE/ACM...
    • Vant'Veer, L.,Dai, H.,Vijver, M.,He, Y.,Hart, A.,Mao, M.,Peterse, H.,Kooy, K.,Marton, M.,Witteveen, A.,Schreiber, G.,Kerkhoven, R.,Roberts,...
    • Wang, L.,Zhu, J.,Zou, H.. (2008). Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics....
    • Wang, Q.,Williamson, M.,Bott, S.,Brookman-Amissah, N.,Freeman, A.,Nariculam1, J.,Hubank3, M.,Ahmed, A.,Masters, J.. (2007). Hypomethylation...
    • Yang, J.,Shi, Y.,Cheng, Q.,Deng, L.. (2006). Expression and localization of aquaporin-5 in the epithelial ovarian tumors. Gynecologic Oncology....
Los metadatos del artículo han sido obtenidos de SciELO México

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno