Verification of the four Spanish official languages on TV show recordings

A. Varona; Mikel Peñagaricano Badiola; Luis Javier Rodríguez Fuentes; M. Díez; Germán Bordel García

Ayuda

Verification of the four Spanish official languages on TV show recordings

Autores: A. Varona, Mikel Peñagaricano Badiola, Luis Javier Rodríguez Fuentes, M. Díez, Germán Bordel García
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 45, 2010, págs. 95-104
Idioma: inglés
Enlaces
- Texto completo
Resumen
- español
  En este trabajo se presentan resultados de verificación sobre las cuatro lenguas oficiales españolas: castellano, catalán, euskera y gallego. Se analizan los resultados obtenidos en tests cerrados y abiertos (estos últimos incluyendo segmentos en francés, portugués, alemán o inglés) y considerando segmentos de voz de 30 segundos. Se realiza también un estudio detallado del rendimiento del sistema por cada lengua objetivo. Se usa la base de datos KALAKA creada especialmente para la Evaluación Albayzín 2008 de sistemas de verificación de la lengua. El sistema de verificación principal resulta de la fusión de un sistema acústico y 6 subsistemas fonotácticos. El sistema acústico toma información de las características espectrales de la señal de audio, mientras que los sistemas fonotácticos utilizan secuencias de fonemas producidas por varios decodificadores acústicos. En este trabajo se alcanza una tasa EER= 3,58 % y un coste CLLR = 0.30 en test cerrado, lo que implica una mejora relativa del 24,5 % con respecto a los mejores resultados obtenidos en la evaluación Albayzin 2008 VL.
- English
  This paper presents language recognition results obtained for the four official Spanish languages: Spanish, Catalan, Basque and Galician. Results were obtained in closed and open tests (these latter including segments in French, Portuguese, German or English) on a subset of 30 second segments. A detailed study per target language is also included. Experiments were carried out on the KALAKA database, especially recorded for The Albayzin 2008 Language Recognition Evaluation. The main verification system resulted from the fusion of an acoustic system and 6 phonotactic subsystems. To model the target language, the acoustic subsystem takes information from the spectral characteristics of the audio signal, whereas phonotactic subsystems use sequences of phones produced by several acoustic-phonetic decoders. The best fused system attained a 3,58 % EER and CLLR = 0.30 in closed tests, which means 24,5 % improvement with regard to the best result obtained in the Albayzin 2008 LRE.
Referencias bibliográficas
- Auckenthaler, R., M. Carey, and H. Lloyd- Thomas. 2000. Score normalization for text-independent speaker verification systems. Digital Signal...
- Brümmer, N. and J. A. du Preez. 2006. Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2-3):230–275.
- Brümmer, N. and D.A. van Leeuwen. 2006. On calibration of language recognition scores. In Proceedings of Odyssey - The Speaker and Language...
- Collobert, R. and S. Bengio. 2001. SVM Torch: Support Vector Machines for Large-Scale Regression Problems. The Journal of Machine Learning...
- Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR – A Library for Large Linear Classification....
- FoCal, 2008. Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers. http://sites.google.com/site/nikobrummer /focal.
- JTH, 2008. 5th Biennial Workshop on Speech Technology. Bilbao, Spain, 12-14 November. http://jth2008.ehu.es/en/index.html.
- Martin, A., G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki. 1997. The DET Curve in Assessment of Detection Task Performance. In Proceedings...
- Martin, A.F. and A.N. Le. 2008. NIST 2007 Language Recognition Evaluation. In Proceedings of Odyssey 2008 - The Speaker and Language Recognition...
- Matejka, P., L. Burget, O. Glembek, P. Schwarz,V. Hubeika, M. Fapso, T. Mikolov, and O. Plchot. 2007. BUT system description for NIST LRE...
- L.J. Rodriguez, G. Bordel, and J. P. Uribe. 2009. University of the Basque Country + Ikerlan System for NIST 2009 Language Recognition...
- Penagarikano, M. and G. Bordel. 2005. Sautrela: A Highly Modular Open Source Speech Recognition Framework. In Proceedings of the ASRU Workshop,...
- Penagarikano, M., G. Bordel, L.J. Rodriguez, and J. P. Uribe. 2007. University of the Basque Country + Ikerlan System for NIST 2007 Language...
- Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2010a. Improved Modeling of Cross- Decoder Phone Co-occurrences in SVM-based...
- Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2010b. Using cross-decoder phone co-ocurrences in phonotactic language recognition....
- Richardson, F. andW. Campbell. 2008. Language recognition with discriminative keyword selection. In Proceedings of ICASSP 2008, pages 4145–4148.
- Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, and A. Varona. 2010a. The Albayzin 2008 Language Recognition Evaluation. In Odyssey...
- Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, A. Varona, and M. Diez. 2010b. KALAKA: A TV broadcast speech database for the evaluation...
- Schwarz, Petr. 2008. Phoneme recognition based on long temporal context. Ph.D. thesis, Faculty of Information Technology, BUT, Brno, CZ.
- Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proc. Intl. Conf. on Spoken Language Processing, pages 257–286, November.
- Torres-Carrasquillo, P.A., E. Singer, W.M. Campbell, T. Gleason, A. McCree, D.A. Reynolds, F. Richardson, W. Shen, and D.E. Sturim. 2008....
- Torres-Carrasquillo, P.A., E. Singer, T. Gleason, A. McCree, D.A. Reynolds, F. Richardson, and D.E. Sturim. 2010. The MITLL NIST LRE 2009...
- Zissman, M.A. 1996. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and...