Ir al contenido

Documat


Semi-Supervised Learning in the Field of Conversational Agents and Motivational Interviewing

  • Autores: Gergana Rosenova, Marcos Fernández Pichel, Selina Meyer, David Enrique Losada Carril Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 73, 2024, págs. 55-67
  • Idioma: inglés
  • Títulos paralelos:
    • Aprendizaje Semisupervisado en el Ámbito de los Agentes Conversacionales y la Entrevista Motivacional
  • Enlaces
  • Resumen
    • español

      La explotación de los conceptos de la Entrevista Motivacional para el análisis de texto contribuye a obtener valiosas lecciones sobre las actitudes y perspectivas de los individuos hacia el cambio de comportamiento. La escasez de datos de usuario etiquetados plantea un desafío continuo e impide avances técnicos en la investigación bajo escenarios de idiomas no ingleses. Para abordar las limitaciones del etiquetado manual de datos, proponemos un método de aprendizaje semisupervisado como medio para aumentar un corpus de entrenamiento existente. Nuestro enfoque aprovecha los datos generados por usuarios obtenidos de comunidades en redes sociales y usando traducción automática y emplea técnicas de autoentrenamiento para la asignación de etiquetas. Con este fin, consideramos varias fuentes y llevamos a cabo una evaluación de múltiples clasificadores entrenados en varios conjuntos de datos aumentados. Los resultados indican que este enfoque de etiquetado débil no produce mejoras en las capacidades de clasificación generales de los modelos. Sin embargo, se observaron mejoras notables para las clases minoritarias. Concluimos que varios factores, incluida la calidad de la traducción automática, pueden potencialmente sesgar los modelos de pseudoetiquetado y que la naturaleza desequilibrada de los datos y el impacto de un umbral de pre-filtrado estricto deben tenerse en cuenta como factores inhibidores del rendimiento.

    • English

      The exploitation of Motivational Interviewing concepts for text analysis contributes to gaining valuable insights into individuals’ perspectives and attitudes towards behaviour change. The scarcity of labelled user data poses a persistent challenge and impedes technical advances in research under non-English language scenarios. To address the limitations of manual data labelling, we propose a semi-supervised learning method as a means to augment an existing training corpus. Our approach leverages machine-translated user-generated data sourced from social media communities and employs self-training techniques for annotation. To that end, we consider various source contexts and conduct an evaluation of multiple classifiers trained on various augmented datasets. The results indicate that this weak labelling approach does not yield improvements in the overall classification capabilities of the models. However, notable enhancements were observed for the minority classes. We conclude that several factors, including the quality of machine translation, can potentially bias the pseudo-labelling models and that the imbalanced nature of the data and the impact of a strict pre-filtering threshold need to be taken into account as inhibiting factors.

  • Referencias bibliográficas
    • Amjad, M., G. Sidorov, and A. Zhila. 2020. Data augmentation using machine translation for fake news detection in the Urdu language. In Proceedings...
    • Bansal, M. A., D. R. Sharma, and D. M. Kathuria. 2022. A systematic review on data scarcity problem in deep learning: solution and applications....
    • Bayer, M., M.-A. Kaufhold, and C. Reuter. 2022. A survey on data augmentation for text classification. ACM Computing Surveys, 55(7):1–39,...
    • Chapelle, O., B. Schölkopf, and A. Zien, editors. 2006. Semi-Supervised Learning. The MIT Press.
    • Crestani, F., D. E. Losada, and J. Parapar. 2022. Early risk prediction of mental health disorders. In Early Detection of Mental Health Disorders...
    • Cui, X., V. Goel, and B. Kingsbury. 2014. Data augmentation for deep neural network acoustic modeling. In 2014 IEEE International Conference...
    • DiClemente, C. C. and J. O. Prochaska. 1998. Toward a comprehensive, transtheoretical model of change: Stages of change and addictive behaviors.
    • Dinh, T. A., D. Liu, and J. Niehues. 2022. Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques....
    • Du, J., E. Grave, B. Gunel, V. Chaudhary, O. Celebi, M. Auli, V. Stoyanov, and A. Conneau. 2020. Self-training improves pre-training for natural...
    • He, J., J. Gu, J. Shen, and M. Ranzato. 2020. Revisiting self-training for neural sequence generation.
    • Hedderich, M. A., L. Lange, H. Adel, J. Strötgen, and D. Klakow. 2021. A survey on recent approaches for natural language processing in low-resource...
    • Hettema, J., J. Steele, and W. R. Miller. 2005. Motivational interviewing. Annual Review of Clinical Psychology, 1(1):91–111. PMID: 17716083.
    • Hoang, V. C. D., P. Koehn, G. Haffari, and T. Cohn. 2018. Iterative back-translation for neural machine translation. In Proceedings of the...
    • Huang, G., A. Gorin, J.-L. Gauvain, and L. Lamel. 2016. Machine translation based data augmentation for cantonese keyword spotting. In 2016...
    • Karamanolakis, G., S. Mukherjee, G. Zheng, and A. H. Awadallah. 2021. Self-training with weak supervision.
    • Ko, T., V. Peddinti, D. Povey, and S. Khudanpur. 2015. Audio augmentation for speech recognition. In Proc. Interspeech 2015, pages 3586–3589.
    • Koroteev, M. V. 2021. Bert: A review of applications in natural language processing and understanding.
    • Krizhevsky, A., I. Sutskever, and G. E. Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, 60(6):84–90,...
    • Kwasnicka, D., S. U. Dombrowski, M. White, and F. Sniehotta. 2016. Theoretical explanations for maintenance of behaviour change: a systematic...
    • Laranjo, L., A. G. Dunn, H. L. Tong, A. B. Kocaballi, J. Chen, R. Bashir, D. Surian, B. Gallego, F. Magrabi, A. Y. S. Lau, and E. Coiera....
    • Lee, D.-H. 2013. Pseudo-label : The simple and efficient semi-supervised learning method for deep neural networks.
    • Liu, X., F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, and J. Tang. 2021. Selfsupervised learning: Generative or contrastive. IEEE Transactions...
    • Medvedev, A. N., R. Lambiotte, and J.-C. Delvenne. 2019. The anatomy of reddit: An overview of academic research. In F. Ghanbarnejad, R. Saha...
    • Meyer, S. and D. Elsweiler. 2022. GLo-HBCD: A naturalistic German dataset for language of health behaviour change on online support forums....
    • Meyer, S. and D. Elsweiler. 2023. Towards cross-content conversational agents for behaviour change: Investigating domain independence and...
    • Meyer, S., D. Elsweiler, B. Ludwig, M. Fern´andez-Pichel, and D. Losada. 2022. Do we still need human assessors? prompt-based gpt-3 user simulation...
    • Miller, W. and S. Rollnick. 2003. Motivational interviewing: Preparing people for change, 2nd ed. Journal For Healthcare Quality, 25:46, 05.
    • Mukherjee, S. and A. Awadallah. 2020. Uncertainty-aware self-training for fewshot text classification. In H. Larochelle, M. Ranzato, R. Hadsell,...
    • Olafsson, S., T. O’Leary, and T. Bickmore. 2019. Coerced change-talk with conversational agents promotes confidence in behavior change. In...
    • Ríssola, E. A., D. E. Losada, and F. Crestani. 2021. A survey of computational methods for online mental state assessment on social media....
    • Rubak, S., A. Sandbæk, T. Lauritzen, and B. Christensen. 2005. Motivational interviewing: a systematic review and metaanalysis. British journal...
    • Schulman, D., T. Bickmore, and C. Sidner. 2011. An intelligent conversational agent for promoting long-term health behavior change using motivational...
    • Shen, J. H. and F. Rudzicz. 2017. Detecting anxiety through Reddit. In Proceedings of the Fourth Workshop on Computational Linguistics and...
    • Shim, H., S. Luca, D. Lowet, and B. Vanrumste. 2020. Data augmentation and semi-supervised learning for deep neural networks-based text classifier....
    • Smriti, D., J. Y. Shin, M. Mujib, M. Colosimo, T.-S. Kao, J. Williams, and J. Huh-Yoo. 2021. Tamica: Tailorable autonomous motivational interviewing...
    • Tadesse, M. M., H. Lin, B. Xu, and L. Yang. 2019. Detection of depression-related posts in reddit social media forum. IEEE Access, 7:44883–44893.
    • Tiedemann, J. and S. Thottingal. 2020. OPUS-MT – building open translation services for the world. In Proceedings of the 22nd Annual Conference...
    • Varshney, N., S. Mishra, and C. Baral. 2021. Interviewer-candidate role play: Towards developing real-world nlp systems.
    • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin. 2017. Attention is all you need....
    • Wei, C., K. Shen, Y. Chen, and T. Ma. 2022. Theoretical analysis of self-training with deep networks on unlabeled data. Xie, Q., Z. Dai, E....
    • Yu, A. W., D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi, and Q. V. Le. 2018. Qanet: Combining local convolution with global self-attention...
    • Yu, Y., S. Zuo, H. Jiang, W. Ren, T. Zhao, and C. Zhang. 2021. Fine-tuning pretrained language model with weak supervision: A contrastive-regularized...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno