Abstract
Identification and recognition of specific functionally-important DNA sequence fragments such as regulatory sequences are considered the most important problems in bioinformatics. One type of such fragments are promoters, i.e., short regulatory DNA sequences located upstream of a gene. Detection of regulatory DNA sequences is important for successful gene prediction and gene expression studies. In this paper, Support Vector Machine (SVM) is used for classification of DNA sequences and recognition of the regulatory sequences. For optimal classification, various SVM learning and kernel parameters (hyperparameters) and their optimization methods are analyzed. In a case study, optimization of the SVM hyperparameters for linear, polynomial and power series kernels is performed using a modification of the Nelder–Mead (downhill simplex) algorithm. The method allows for improving the precision of identification of the regulatory DNA sequences. The results of promoter recognition for the drosophila sequence datasets are presented.
Similar content being viewed by others
References
Ali S, Smith KA (2003) Automatic parameter selection for polynomial kernel. In: Proc of the IEEE int conf on information reuse and integration (IRI 2003), October 27–29, 2003, Las Vegas, NV, USA, pp 243–249
Ancona N, Cicirelli G, Stella E, Distante A (2002) Object detection in images: Run-time complexity and parameter selection of Support Vector Machines. In: Proc of the 16th int conf on pattern recognition (ICPR’02), 11–15 August 2002, Quebec, Canada, vol 2, pp 426–429
Ayat NE, Cheriet M, Suen CY (2002) Empirical error based optimization of SVM kernels: Application to digit image recognition. In: Proc of the 8th int workshop on frontiers in handwriting recognition (IWFHR’02), August 6–8, 2002, p 292
Boardman M, Trappenberg T (2006) A heuristic for free parameter optimization with Support Vector Machines. In: Proc of IEEE int joint conf on neural networks (IJCNN 2006), July 16–21, 2006, Vancouver, Canada, pp 1337–1344
Brunak S, Engelbrecht J, Knudsen S (1991) Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 220:49–65
Cassabaum ML, Waagen DE, Rodriguez JJ, Schmitt HA (2004) Unsupervised optimization of Support Vector Machine parameters. In: Kadar I (ed) Automatic target recognition XIV. Proc of SPIE, vol 5426(1), SPIE Defense & Security Symposium, Orlando, FL, April 13–15, 2004, pp 316–325
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159
Cherkassky V, Mulier F (1998) Learning from data: concepts, theory, and methods. Wiley, New York
Christmann A, Luebke K, Rüping S, Marin-Galianos M (2005) Determination of hyperparameters for kernel-based classification and regression. Technical report 38/05, SFB475, University of Dortmund, Germany
Damaševičius R (2008a) Splice site recognition in DNA sequences using k-mer frequency based mapping for Support Vector Machine with power series kernel. In: Proc of int conf on complex software intensive systems (CISIS-2008), March 4–7, 2008, Barcelona, Spain, pp 687–692
Damaševičius R (2008b) Feature representation of DNA sequences for machine learning tasks. In: Proc of fifth int workshop on computational systems biology (WCSB 2008), June 11–13, 2008, Leipzig, Germany, pp 29–32
Damaševičius R (2008c) Analysis of binary feature mapping rules for promoter recognition in imbalanced DNA sequence datasets using Support Vector Machine. In: Proc of IEEE int conf on intelligent systems (IS’08), September 6–8, 2008, Varna, Bulgaria, vol 2, pp 1120–1125
Debnath R, Takahashi H (2004) An efficient method for tuning kernel parameter of the support vector machine. In: Proc of the IEEE int symp on communications and information technology (ISCIT 2004), Sapporo, Japan, October 2004, vol 2, pp 1023–1028
Demeler B, Zhou GW (1991) Neural network optimization for E. coli promoter prediction. Nucleic Acids Res 19:1593–1599
Duan K, Keerthi SS, Poo AN (2003) Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51:41–59
Eitrich T, Lang B (2006) Efficient optimization of Support Vector Machine learning parameters for unbalanced data sets. J Comput Appl Math 196(2):425–436
Friedrichs F, Igel C (2004) Evolutionary tuning of multiple SVM parameters. In: Trends in neurocomputing: 12th European symp on artificial neural networks 2004, vol 64, pp 107–117
Frohlich H, Zell A (2005) Efficient parameter selection for Support Vector Machines in classification and regression via model-based global optimization. In: Proc of IEEE int joint conf on neural networks (IJCNN ’05), 31 July–4 Aug 2005, vol 3, pp 1431–1436
Gold C, Sollich P (2005) Fast Bayesian Support Vector Machine parameter tuning with the nystrom method. In: Proc. of the IEEE int joint conf on neural networks (IJCNN ’05), July 31–August 4, 2005, Montréal, Québec, Canada, vol 5, pp 2820–2825
Gordon L, Chervonenkis A, Gammerman AJ, Shahmuradov IA, Solovyev VV (2003) Sequence alignment kernel for recognition of promoter regions. Bioinformatics 19:1964–1971
Imbault F, Lebart K (2004) A stochastic optimization approach for parameter tuning of support vector machines. In: Proc of the 17th int conf on pattern recognition (ICPR 2004), 23–26 August 2004, Cambridge, UK, vol 4, pp 597–600
Kulkarni A, Jayaraman VK, Kulkarni BD (2004) Support vector classification with parameter tuning assisted by agent-based technique. Comput Chem Eng 28(3):311–318
Kurasova O, Dzemyda G, Vainoras A (2007) Parameter system for human physiological data representation and analysis. In: Proc of 3rd Iberian conf on pattern recognition and image analysis, IbPRIA 2007, Girona, Spain, June 6–8, 2007. LNCS, vol 4477, pp 209–216
Lim H (2004). Support vector parameter selection using experimental design based generating set search (SVEG) with application to predictive software data modeling. PhD thesis, Syracuse University
Lin C-J, Peng C-C, Lee C-Y (2004) Prediction of RNA polymerase binding sites using purine-pyrimidine encoding and hybrid learning methods. Int J Appl Sci Eng 2:177–188
Liu YA, Stoller SD, Teitelbaum T (1998) Static caching for incremental computation. ACM Trans Program Lang Syst 20(3):546–585
Mattera D, Haykin S (1999) Support vector machines for dynamic reconstruction of a chaotic system. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods: support vector learning, pp 209–241
Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313
Platt J (2000) Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. MIT Press, Cambridge
Quan Y, Yang J (2003) An improved parameter tuning method for support vector machines. In: Proc of 9th int conf on rough sets, fuzzy sets, data mining, and granular computing (RSFDGrC 2003), Chongqing, China, May 26–29, 2003, pp 607–610
Ranawana R, Palade V (2005) A neural network based multiclassifier system for gene identification in DNA sequences. J Neural Comput Appl 14:122–131
Raudys S (2005) Texonomy of classifiers based on dissimilarity features. In: Proc of 3rd int conf on advances in pattern recognition, ICAPR 2005, Bath, UK, August 22–25, 2005. LNCS, vol 3686, pp 136–145
Schittkowski K (2005) Optimal parameter selection in Support Vector Machines. J Ind Manag Optim 1(4):465–476
Smola AJ, Murata N, Schölkopf B, Miller KR (1998) Asymptotically optimal choice of ε-loss for support vector machines. In: Proc of 8th int conference on artificial neural networks, Berlin, Germany, pp 105–110
Sobha Rani T, Durga Bhavani S, Bapi RS (2007) Analysis of E.coli promoter recognition problem in dinucleotide feature space. Bioinformatics 23(5):582–588
Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York
van der Walt CM, Barnard E (2006) Data characteristics that determine classifier performance. In: Proc of the 16th annual symp of the pattern recognition association of South Africa, pp 160–165
Werner T (2003) The state of the art of mammalian promoter recognition. Brief Bioinform 4(1):22–30
Yan B, Domeniconi C (2006) Kernel optimization using pairwise constraints for semi-supervised clustering. Technical report ISE-TR-06-09, Information and Software Engineering Department, George Mason University, Fairfax, Virginia, USA
Zhuang L, Dai H (2006) Parameter optimization of kernel-based one-class classifier on imbalance learning. J Comput 1(7):32–40
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Damaševičius, R. Optimization of SVM parameters for recognition of regulatory DNA sequences. TOP 18, 339–353 (2010). https://doi.org/10.1007/s11750-010-0152-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11750-010-0152-x