Ir al contenido

Documat


Adaptación de ASR al habla de personas con síndrome de Down

  • Autores: David Fernández García, Valentín Cardeñoso Payo Árbol académico, César González Ferreras Árbol académico, David Escudero Mancebo Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 73, 2024, págs. 209-220
  • Idioma: español
  • Títulos paralelos:
    • ASR model adaptation to the speech of people with Down syndrome
  • Enlaces
  • Resumen
    • español

      El habla de las personas con discapacidad intelectual (DI) plantea enormes retos a los sistemas de reconocimiento automático del habla (ASR), dificultando con ello el acceso de una población especialmente sensible a los servicios de información. En este trabajo se estudian las dificultades de los sistemas ASR para reconocer habla de personas DI y se muestra cómo esta limitación puede ser combatida con estrategias de ajuste fino de modelos. Se mide el rendimiento de ASR basado en whisper (v2 y v3) con un corpus de referencia de habla típica y habla DI, comprobando que hay diferencias importantes y significativas. Aplicando técnicas de fine-tuning, el rendimiento para hablantes DI mejora en al menos 30 puntos porcentuales. Nuestros resultados muestran que la inclusión de voz de personas DI en los corpus de entrenamiento es fundamental para mejorar la eficacia de los ASR.

    • English

      The speech of people with intellectual disabilities (ID) poses enormous challenges to automatic speech recognition (ASR) systems, making it difficult for a particularly sensitive population to access information services. This work studies the difficulties of ASR systems in recognizing the speech of ID people and shows how this limitation can be combated with model fine-tuning strategies. The performance of ASR based on whisper (v2 and v3) is measured with a reference corpus of typical speech and DI speech, verifying that there are important and significant differences. By applying fine-tuning techniques, performance for DI speakers improves by at least 30 percentage points. Our results show that the inclusion of the voice of ID people in the training corpora is essential to improve the effectiveness of ASRs.

  • Referencias bibliográficas
    • Almadhor, A., R. Irfan, J. Gao, N. Saleem, H. Tayyab Rauf, y S. Kadry. 2023. E2e-dasr: End-to-end deep learning-based dysarthric automatic...
    • American Psychiatric Association. 2013. Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). American Psychiatric...
    • Bhat, C. y H. Strik. 2020. Automatic assessment of sentence-level dysarthria intelligibility using blstm. IEEE Journal of Selected Topics...
    • Caton, S. y M. Chapman. 2016. The use of social media and people with intellectual disability: A systematic review and thematic analysis....
    • Chapman, R. S. 1997. Language development in children and adolescents with Down syndrome. Mental Retardation and Developmental Disabilities...
    • Cibrian, F. L., K. Anderson, C. M. Abrahamsson, V. G. Motti, y others. 2024. Limitations in speech recognition for young adults with Down...
    • Cleland, J., S. Wood, W. Hardcastle, J. Wishart, y C. Timmins. 2010. Relationship between speech, oromotor, language and cognitive abilities...
    • Conneau, A., M. Ma, S. Khanuja, Y. Zhang, V. Axelrod, S. Dalmia, J. Riesa, C. Rivera, y A. Bapna. 2023. Fleurs: Few-shot learning evaluation...
    • De Russis, L. y F. Corno. 2019. On the impact of dysarthric speech on contemporary asr cloud platforms. Journal of Reliable Intelligent Environments,...
    • Escudero-Mancebo, D., M. Corrales-Astorgano, V. Cardeñoso-Payo, L. Aguilar, C. González-Ferreras, P. Martínez-Castilla, y V. Flores-Lucas....
    • Feng, J., J. Lazar, L. Kumin, y A. Ozok. 2010. Computer usage by children with Down syndrome: Challenges and future research. ACM Transactions...
    • Green, J. R., R. L. MacDonald, P.-P. Jiang, J. Cattiau, R. Heywood, R. Cave, K. Seaver, M. A. Ladewig, J. Tobin, M. P. Brenner, P. C. Nelson,...
    • Hermann, E. y M. Magimai.-Doss. 2023. Fewshot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation. En Proc. INTERSPEECH 2023,...
    • Hu, R., J. Feng, J. Lazar, y L. Kumin. 2013. Investigating input technologies for children and young adults with Down syndrome. Universal...
    • Janbakhshi, P., I. Kodrasi, y H. Bourlard. 2021. Automatic dysarthric speech detection exploiting pairwise distance-based convolutional neural...
    • Jiao, Y., M. Tu, V. Berisha, y J. Liss. 2018. Simulating dysarthric speech for training data augmentation in clinical speech applications....
    • Kitzing, P., A. Maier, y V. L. ˚Ahlander. 2009. Automatic speech recognition (asr) and its use as a tool for assessment or therapy of voice,...
    • Kumin, L. 2012. Early communication skills for children with Down syndrome: A guide for parents and professionals. Woodbine House, 3ª edición.
    • Laws, G. y D. V. Bishop. 2004. Verbal deficits in Down’s syndrome and specific language impairment: a comparison. International Journal of...
    • Lea, C., Z. Huang, J. Narain, L. Tooley, D. Yee, D. T. Tran, P. Georgiou, J. P. Bigham, y L. Findlater. 2023. From user perceptions to technical...
    • MacDonald, R. L., P.-P. Jiang, J. Cattiau, R. Heywood, R. Cave, K. Seaver, M. A. Ladewig, J. Tobin, M. P. Brenner, P. C. Nelson, J. R. Green,...
    • Martin, G. E., J. Klusek, B. Estigarribia, y J. E. Roberts. 2009. Language characteristics of individuals with Down syndrome. Topics in language...
    • Mitra, V., Z. Huang, C. Lea, L. Tooley, S. Wu, D. Botten, A. Palekar, S. Thelapurath, P. Georgiou, S. Kajarekar, y J. Bigham. 2021. Analysis...
    • Prananta, L., B. Halpern, S. Feng, y O. Scharenborg. 2022. The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved...
    • Radford, A., J. W. Kim, T. Xu, G. Brockman, C. McLeavey, y I. Sutskever. 2023. Robust speech recognition via large-scale weak supervision....
    • Rosen, K. y S. Yampolsky. 2000. Automatic speech recognition and a review of its functioning with dysarthric speech. Augmentative and Alternative...
    • Schultz, B. G., V. S. A. Tarigoppula, G. Noffs, S. Rojas, A. van der Walt, D. B. Grayden, y A. P. Vogel. 2021. Automatic speech recognition...
    • Shahamiri, S. R. 2021. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions...
    • Shor, J., D. Emanuel, O. Lang, O. Tuval, M. Brenner, J. Cattiau, F. Vieira, M. McNally, T. Charbonneau, M. Nollstadt, A. Hassidim, y Y. Matias....
    • Tanis, E. S., S. Palmer, M. Wehmeyer, D. K. Davies, S. E. Stock, K. Lobb, y B. Bishop. 2012. Self-report computer-based survey of technology...
    • Timmer, J. y M. Koenig. 1995. On generating power law noise. Astronomy and Astrophysics, v. 300, p. 707, 300:707.
    • Tobin, J., Q. Li, S. Venugopalan, K. Seaver, R. Cave, y K. Tomanek. 2022. Assessing ASR Model Quality on Disordered Speech using BERTScore....
    • Tobin, J. y K. Tomanek. 2022. Personalized automatic speech recognition trained on small disordered speech datasets. En ICASSP 2022 - 2022...
    • Tomanek, K., F. Beaufays, J. Cattiau, A. Chandorkar, y K. C. Sim. 2021. On-device personalization of automatic speech recognition models for...
    • Venugopalan, S., J. Shor, M. Plakal, J. Tobin, K. Tomanek, J. R. Green, y M. P. Brenner. 2021. Comparing Supervised Models and Learned Speech...
    • Wang, C., M. Riviere, A. Lee, A. Wu, C. Talnikar, D. Haziza, M. Williamson, J. Pino, y E. Dupoux. 2021. VoxPopuli: A large-scale multilingual...
    • Wong, B., C. Brebner, P. McCormack, y A. Butcher. 2015. Word production inconsistency of Singaporean-English-speaking adolescents with Down...
    • Zhang, T., V. Kishore, F. Wu, K. Q. Weinberger, y Y. Artzi. 2020. Bertscore: Evaluating text generation with bert. En International Conference...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno