Methods Towards Improving Safeness in Responses of a Spanish Suicide Information Chatbot

Pablo Ascorbe Fernández; María Soledad Campos Burgui; César Domínguez; Jónathan Heras Vicente; Magdalena Pérez Trenado

Ayuda

Methods Towards Improving Safeness in Responses of a Spanish Suicide Information Chatbot

Autores: Pablo Ascorbe Fernández, María Soledad Campos Burgui, César Domínguez , Jónathan Heras Vicente , Magdalena Pérez Trenado
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 83-94
Idioma: inglés
Títulos paralelos:
- Métodos para Mejorar la Seguridad de las Respuestas en un Chatbot que Proporciona Información sobre Suicidio en Castellano
Enlaces
- Texto completo
Resumen
- español
  Los chatbots tienen un gran potencial para proporcionar información valiosa en campos sensibles como la salud mental. Sin embargo, garantizar la fiabilidad y la seguridad de estos sistemas es fundamental y representa un paso crucial antes del despliegue de los chatbots. En este articulo, presentamos nuestro trabajo orientado a mejorar la seguridad de un chatbot en español basado en el modelo Retrieval-Augmented Generation (RAG) y diseñado para ofrecer información sobre el suicidio. A través de un proceso de validación en múltiples etapas, identificamos y clasificamos las respuestas inseguras del chatbot utilizando modelos de clasificación de red-teaming y mediante una validación manual por parte de expertos. Este proceso nos permitió descubrir varias fuentes de respuestas inseguras y aplicar estrategias especificas para mitigarlas. Como resultado, menos del 1h de las preguntas formuladas por los usuarios y menos del 5h de las preguntas de red-teaming fueron clasificadas como inseguras. Las acciones propuestas se centraron en mejorar los componentes clave del chatbot (incluyendo la base de datos de documentos, el diseño del prompt y el modelo de lenguaje) y pueden extrapolarse para mejorar la seguridad de otros chatbots similares basados en RAG. Advertencia: Este documento contiene contenidos que pueden resultar molestos.
- English
  Chatbots hold great potential for providing valuable information in sensitive fields such as mental health. However, ensuring the reliability and safety of these systems is essential and represents a crucial first step before the deployment of those chatbots. In this paper, we report our work aimed at enhancing the safeness of a Spanish suicide information chatbot based on Retrieval Augmented Generation (RAG). Namely, after a multi-stage validation process, we identified and classified unsafe answers of the chatbot by applying red-teaming classification models and manual validation by experts. This process allowed us to uncover several sources of unsafe responses, and to implement targeted mitigation strategies. As a result, fewer than 1h user-generated questions and fewer than 5h of red-teaming questions were classified by experts as unsafe. Our proposed actions focused on improving the chatbot's key components — including the document database, prompt engineering, and the underlying large language model — and can be extrapolated to enhance the safety of similar RAG-based chatbots.
Referencias bibliográficas
- Abd-Alrazaq, A. A., M. Alajlani, N. Ali, K. Denecke, B. M. Bewick, and M. Househ. 2021. Perceptions and opinions of patients about mental...
- Ascorbe, P., M. S. Campos, C. Dominguez, J. Heras, M. Pérez, and A. R. Terroba-Reinares. 2024b. Automatic and manual evaluation of a spanish...
- Burckhardt, C. 2021. 5 preguntas comunes sobre el suicidio. https://telefonodelaesperanza.ch/5-preguntas-comunes-sobre-suicidio/.
- Chan, J. X., S.-L. Chua, and L. K. Foo. 2022. A two-stage classification chatbot for suicidal ideation detection. In International Conference...
- Comisionado de Salud Mental, Ministerio de Sanidad. 2025. Plan de acción para la prevención del suicidio 2025-2027. https://www.sanidad.gob.es/areas/calidadAsistencial/estrategias/saludMental/docs/Plan_de_accion_para_la_prevencion_del_suicidio_2025_2027.pdf.
- Dang, J., S. Singh, D. D’souza, A. Ahmadian, A. Salamanca, M. Smith, et al. 2024. Aya expanse: Combining research breakthroughs for a new...
- Deng, Y., Y. Yang, J. Zhang, W. Wang, and B. Li. 2025. DuoGuard: A two-player RLdriven framework for multilingual LLM guardrails. arXiv:2502.05163.
- Elsayed, N., Z. ElSayed, and M. Ozer. 2024. CautionSuicide: A Deep Learning Based Approach for Detecting Suicidal Ideation in Real Time Chatbot...
- Gemma Team. 2025. Gemma 3. https://goo.gle/Gemma3Report.
- Haque, M. R. and S. Rubya. 2023. An overview of chatbot-based mobile mental health apps: insights from app description and user reviews. JMIR...
- Holmes, G., B. Tang, S. Gupta, S. Venkatesh, H. Christensen, and A. Whitton. 2025. Applications of large language models in the field of suicide...
- Huang, S.-H., Y.-F. Lin, Z. He, C.-Y. Huang, and T.-H. K. Huang. 2024. How does conversation length impact user’s satisfaction? A case study...
- Inan, H., K. Upasani, J. Chi, R. Rungta, K. Iyer, Y. Mao, M. Tontchev, Q. Hu, B. Fuller, D. Testuggine, et al. 2023. Llama guard: LLM-based...
- Instituto Nacional de Estadística. 2023. Defunciones según la causa de muerte año 2022. Technical report.
- Ji, S., S. Pan, X. Li, E. Cambria, G. Long, and Z. Huang. 2020. Suicidal ideation detection: A review of machine learning methods and applications....
- Khawaja, Z. and J.-C. Bélisle-Pipon. 2023. Your robot therapist is not your therapist: understanding the role of AIpowered mental health chatbots....
- Marvin, G., N. Hellen, D. Jjingo, and J. Nakatumba-Nabende. 2023. Prompt engineering in large language models. In International conference...
- Meta Team. 2025. Llama3.3. https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct.
- Ministerio de Sanidad, Política Social e Igualdad. 2012. Guía de práctica clínica de prevención y tratamiento de la conducta suicida para...
- Ministerio de Sanidad, Política Social e Igualdad. 2020. Guía de práctica clínica de prevención y tratamiento de la conducta suicida. https://www.fsme.es/centro-de-documentaci%C3%B3n-sobre-conducta-suicida/gu%C3%ADas-sobre-conducta-suicida/gpc/.
- National Institutes of Health. 2023. Preguntas frecuentes sobre el suicidio. https://www.nimh.nih.gov/health/publications/espanol/preguntas-frecuentes-sobre-el-/suicidio.
- Rebedea, T., R. Dinu, M. Sreedhar, C. Parisien, and J. Cohen. 2023. NeMo guardrails: A toolkit for controllable and safe LLM applications...
- Rioja Salud. 2019. Plan de prevención del suicidio en La Rioja. https://www.riojasalud.es/files/content/ciudadanos/planes-estrategicos/PLAN_PREVENCION_CONDUCTA_SUICIDA_DEF.pdf.
- Romero, M., C. Casadevante, and H. Montoro. 2020. Cómo construir un psicólogo-chatbot. Papeles del Psicólogo, 41(1):27–34.
- Savage, N. 2023. The rise of the chatbots. Communications of the ACM, 66(7):16–17.
- Seitz, L. 2024. Artificial empathy in healthcare chatbots: Does it feel authentic? Computers in Human Behavior: Artificial Humans, 2(1):100067.
- Servicio Canario de Salud. 2021. Programa de prevención de la conducta suicida en Canarias. https://www3.gobiernodecanarias.org/stopsuicidio/es/plan-de-seguridad.
- Servicio Navarro de Salud-Osasunbidea. 2021. Plan de atención a las personas con conductas suicidas en la red de salud mental de Navarra....
- Sweeney, C., C. Potts, E. Ennis, R. Bond, M. D. Mulvenna, S. O’neill, M. Malcolm, L. Kuosmanen, C. Kostenius, A. Vakaloudis, et al. 2021....
- Tedeschi, S., F. Friedrich, P. Schramowski, K. Kersting, R. Navigli, H. Nguyen, and B. Li. 2024. ALERT: A Comprehensive Benchmark for Assessing...
- Teléfono de la esperanza. 2019. Cómo prevenir y actuar ante el suicidio. https://telefonodelaesperanza.org/assets/Guia%20del%20suicidio.pdf.
- Vaidyam, A. N., H. Wisniewski, J. D. Halamka, M. S. Kashavan, and J. B. Torous. 2019. Chatbots and conversational agents in mental health:...
- Valizadeh, M. and N. Parde. 2022. The AI doctor is in: A survey of task-oriented dialogue systems for healthcare applications. In Proceedings...
- WHO. 2021. Suicide worldwide in 2019: global health estimates.
- World Health Organization. 2023. Suicidio. https://www.who.int/es/news-room/questions-and-answers/item/suicide.
- Xue, J., B. Zhang, Y. Zhao, Q. Zhang, C. Zheng, J. Jiang, H. Li, N. Liu, Z. Li, W. Fu, et al. 2023. Evaluation of the current state of chatbots...
- Zhang, T., A. M. Schoene, S. Ji, and S. Ananiadou. 2022. Natural language processing applied to mental illness detection: a narrative review....