Ir al contenido

Documat


Review of spoken dialogue systems

  • Ramón López-Cózar [1] ; Zoraida Callejas [1] ; David Griol [2] ; José F. Quesada [3]
    1. [1] Universidad de Granada

      Universidad de Granada

      Granada, España

    2. [2] Universidad Carlos III de Madrid

      Universidad Carlos III de Madrid

      Madrid, España

    3. [3] Universidad de Sevilla

      Universidad de Sevilla

      Sevilla, España

  • Localización: Loquens : revista española de ciencias del habla, ISSN-e 2386-2637, Nº. 1, 2, 2014
  • Idioma: inglés
  • DOI: 10.3989/loquens.2014.012
  • Títulos paralelos:
    • Sistemas de diálogo: una revisión
  • Enlaces
  • Resumen
    • español

      Los sistemas de diálogo son programas de ordenador desarrollados para interaccionar con los usuarios mediante habla, con la finalidad de proporcionarles servicios automatizados.

      La interacción se lleva a cabo mediante turnos de un tipo de diálogo que, en muchos estudios existentes en la literatura, los investigadores intentan que se parezca lo más posible al diálogo real que se lleva a cabo entre las personas en lo que se refiere a naturalidad, inteligencia y contenido afectivo.

      En este artículo describimos los fundamentos de esta tecnología, incluyendo las tecnologías básicas que se utilizan para implementar este tipo de sistemas. También presentamos una evolución de la tecnología y comentamos algunas aplicaciones actuales. Asimismo, describimos paradigmas de interacción, incluyendo lenguajes de script y desarrollo de interfaces conversacionales para aplicaciones móviles.

      Un aspecto clave de esta tecnología consiste en realizar un correcto modelado del usuario. Por este motivo, discutimos diversos modelos afectivos, de personalidad y contextuales. Finalmente, comentamos algunas líneas de investigación actuales relacionadas con la comunicación verbal, interacción multimodal y gestión del diálogo.

    • English

      Spoken dialogue systems are computer programs developed to interact with users employing speech in order to provide them with specific automated services. The interaction is carried out by means of dialogue turns, which in many studies available in the literature, researchers aim to make as similar as possible to those between humans in terms of naturalness, intelligence and affective content.

      In this paper we describe the fundaments of these systems including the main technologies employed for their development. We also present an evolution of this technology and discuss some current applications. Moreover, we discuss development paradigms, including scripting languages and the development of conversational interfaces for mobile apps.

      The correct modelling of the user is a key aspect of this technology. This is why we also describe affective, personality and contextual models. Finally, we address some current research trends in terms of verbal communication, multimodal interaction and dialogue management.

  • Referencias bibliográficas
    • Acosta, J. C., & Ward, N. G. (2011). Achieving rapport with turnby-turn, user-responsive emotional coloring. Speech Communication, 53(9–10),...
    • Ahmad, F., Hogg-Johnson, S., Stewart, D. E., Skinner, H. A., Glazier, R. H., & Levinson, W. (2009). Computer-assisted screening for intimate...
    • Alexandersson, J., Girenko, A., Spiliotopoulos, D., Petukhova, V., Klakow, D., Koryzis, D., … & Gardner, M. (2014). Metalogue: A multiperspective...
    • Allen, J. (1995). Natural language understanding. Redwood City, CA: The Benjamin Cummings.
    • Andrade, A. O., Pereira, A. A., Walter, S., Almeida, R., Loureiro, R., Compagna, D., & Kyberd, P. J. (2014). Bridging the gap between...
    • Andreani, G., Di Fabbrizio, D., Gilbert, M., Gillick, D., Hakkani-Tur, D., & Lemon, O. (2006). Let's DISCOH: Collecting an annotated...
    • Baker, R. S. J. d., D'Mello, S. K., Rodrigo, M. M. T., & Graesser, A. C. (2010). Better to be frustrated than bored: The incidence,...
    • Balahur, A., Mihalcea, R., & Montoyo, A. (2014). Computational approaches to subjectivity and sentiment analysis: Present and envisaged...
    • Baptist, L., & Seneff, S. (2000). GENESIS-II: A versatile system for language generation in conversational system applications. Proceedings...
    • Batliner, A., Seppi, D. Steidl, S., & Schuller, B. (2010). Segmenting into adequate units for automatic recognition of emotion-related...
    • Beskow, J., Edlund, J., Granström, B., Gustafson, J., Skantze, G., & Tobiasson, H. (2009). The MonAMI reminder: A spoken dialogue system...
    • Bickmore, T., & Giorgino, T. (2006). Health dialog systems for patients and consumers. Journal of Biomedical Informatics, 39(5), 556–571....
    • Bickmore, T. W., Puskar, K., Schlenk, E. A., Pfeifer, L. M., & Sereika, S. M. (2010). Maintaining reality: Relational agents for antipsychotic...
    • Black, L. A., McTear, M. F., Black, N. D., Harper, R., & Lemon, M. (2005). Appraisal of a conversational artefact and its utility in remote...
    • Bohus, D., Raux, A., Harris, T. K., Eskenazi, M., & Rudnicky, A. I. (2007). Olympus: An open-source framework for conversational spoken...
    • Bohus, D., & Rudnicky, A. I. (2003). RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda. Proceedings...
    • Bouakaz, S., Vacher, M., Bobillier Chaumon, M.-E., Aman, F., Bekkadja, S., Portet, F., … & Chevalier, T. (2014). CIRDO: Smart companion...
    • Boves L., & Os, E. den (2002). Multimodal services–A MUST for UMTS (Tech. Rep.). EURESCOM 2002.
    • Bui, T. H. (2006). Multimodal dialogue management - State of the art. Human Media Interaction Department, University of Twente (Vol. 2). PMid:16789818...
    • Callejas, Z., Griol, D., Engelbrecht, K.-P., & López-Cózar, R. (2014). A clustering approach to assess real user profiles in spoken dialogue...
    • Callejas, Z., Griol, D., & López-Cózar, R. (2011). Predicting user mental states in spoken dialogue systems. EURASIP Journal on Advances...
    • Callejas, Z., Griol, D., & López-Cózar, R. (2014). A framework for the assessment of synthetic personalities according to user perception....
    • Callejas, Z., López-Cózar, D., Ábalos, N., & Griol, D. (2011). Affective conversational agents: The role of personality and emotion in...
    • Callejas, Z., Ravenet, B., Ochs, M., & Pelachaud, C. (2014). A model to generate adaptive multimodal job interviews with a virtual recruiter....
    • Calvo, R. A., & D'Mello, S. (2010). Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE...
    • Cavazza, M., de la Camara, R. S., & Turunen, M. (2010). How was your day? A Companion ECA. Proceedings of the 9th International Conference...
    • Cohen, P. (1997). Dialogue modeling. In R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, & V. Zue (Eds.), Survey of the state of the art...
    • Cohen, M. H., Giangola, J. P., & Balogh, J. (2004). Voice user interface design. Boston, MA: Addison-Wesley.
    • Corradini, A., Fredriksson, M., Mehta, M., Königsmann, J., Bernsen, N. O., & Johanneson, L. (2004). Towards believable behavior generation...
    • Creed, C., & Beale, R. (2012). User interactions with an affective nutritional coach. Interacting with Computers, 24(5), 339–350. http://dx.doi.org/10.1016/j.intcom.2012.05.004
    • Dahl, G. E., Yu, D., Deng, L., & Acero, A. (2012). Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition....
    • Dalianis, H. (1999). Aggregation in natural language generation. Computational Intelligence, 15(4), 384–414. http://dx.doi.org/10.1111/0824-7935.00099
    • De Silva, L. C., Morikawa, C., & Petra, I. M. (2012). State of the art of smart homes. Engineering Applications of Artificial Intelligence,...
    • Delichatsios, H., Friedman, R. H., Glanz, K., Tennstedt, S., Smigelski, C., Pinto, B., … & Gillman, M. W. (2001). Randomized trial of...
    • Dethlefs, N., Hastie, H., Cuayáhuitl, H., & Lemon, O. (2013). Conditional random fields for responsive surface realisation using global...
    • Dutoit, T. (1996). An introduction to Text-to-Speech synthesis. Dordrecht: Kluwer Academic.
    • Dybkjær, L., Bernsen, N. O., & Minker, W. (2004). Evaluation and usability of multimodal spoken language dialogue systems. Speech Communication,...
    • Expert Advisory Group on Language Engineering Standards (EAGLES) (1996). Evaluation of natural language processing systems (Tech. Rep.). EAGLES...
    • Failenschmid, K., Williams, D., Dybkjær, L., & Bernsen, N. (1999). DISC Deliverable D3.6 (Tech. Rep.). NISLab, University of Southern...
    • Farzanfar, R., Frishkopf, S., Migneault, J., & Friedman, R. (2005). Telephone-linked care for physical activity: A qualitative evaluation...
    • Foster, M. E., Giuliani, M., & Isard, A. (2014). Task-based evaluation of context-sensitive referring expressions in human-robot dialogue....
    • Frampton, M., & Lemon, O. (2009). Recent research advances in reinforcement learning in spoken dialogue systems. Knowledge Engineering...
    • Fryer, L., & Carpenter, R. (2006). Bots as language learning tools. Language Learning and Technology, 10(3), 8–14
    • Geutner, P., Steffens, F., & Manstetten, D. (2002). Design of the VICO spoken dialogue system: Evaluation of user expectations by Wizard-of-Oz...
    • Ghanem, K. G., Hutton, H. E., Zenilman, J. M., Zimba, R., & Erbelding, E. J. (2005). Audio computer assisted self interview and face to...
    • Glass, J., Flammia, G., Goodine, D., Phillips, M., Polifroni, J., Sakai, S., … & Zue, V. (1995). Multilingual spoken-language understanding...
    • Graaf, M. M. A. de, & Ben Allouch, S. (2013). Exploring influencing variables for the acceptance of social robots. Robotics and Autonomous...
    • Griol, D., Callejas, Z., López-Cózar, R., & Riccardi, G. (2014). A domain- independent statistical methodology for dialog management in...
    • Griol, D., Molina, J. M., Sanchis de Miguel, A., & Callejas, Z. (2012). A proposal to create learning environments in virtual worlds integrating...
    • Hardy, H., Biermann, A., Bryce Inouye, R., McKenzie, A., Strzalkowski, T., Ursu, C., … & Wu, M. (2006). The AMITIÉS system: Data-driven...
    • Harris, R. A. (2004). Voice interaction design: Crafting the new conversational speech systems. Morgan Kaufmann.
    • He, Y., & Young, S. (2005). Semantic processing using the Hidden Vector State Model. Computer Speech and Language, 19(1), 85–106. http://dx.doi.org/10.1016/j.csl.2004.03.001
    • Heinroth, T., & Minker, W. (2013). Introducing spoken dialogue systems into Intelligent Environments. New York: Springer. http://dx.doi.org/10.1007/978-1-4614-5383-3...
    • Hempel, T. (2008). Usability of speech dialogue systems: Listening to the target audience. Springer.
    • Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A. Jaitly, N., … & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in...
    • Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.
    • Hubal, R., & Day, R. S. (2006). Informed consent procedures: An experimental test using a virtual character in a dialog systems training...
    • Hudlicka, E. (2014). Affective BICA: Challenges and open questions. Biologically Inspired Cognitive Architectures, 7, 98–125. http://dx.doi.org/10.1016/j.bica.2013.11.002
    • Janarthanam, S., Lemon, O., Liu, X., Bartie, P., Mackaness, W., & Dalmas, T. (2013). A multithreaded conversational interface for pedestrian...
    • Jokinen, K., Kanto, K., & Rissanen, J. (2004). Adaptative user modelling in AthosMail. Lecture Notes on Computer Science, 3196, 149–158.
    • Jurafsky, D., & Martin, J. H. (2009). Speech and language processing: An introduction to natural language processing, speech recognition,...
    • Kerly, A., Ellis, R., & Bull, S. (2008). CALMsystem: A conversational model for learner modelling. Knowledge-Based Systems, 21(3), 238–246....
    • Kortum, P. (2008). HCI beyond the GUI: Design for haptic, speech, olfactory, and other nontraditional interfaces. Morgan Kaufmann.
    • Kovács, G. L., & Kopácsi, S. (2006). Some aspects of Ambient Intelligence. Acta Polytechnica Hungarica, 3(1), 35–60.
    • Krebber, J. Möller, S., Pegam, R., Jekosch, U., Melichar, M., & Rajman, M. (2004). Wizard-of-Oz tests for a dialog system in smart homes....
    • Larsson, S. & Traum, D. R. (2000). Information state and dialogue management in the TRINDI dialogue move engine toolkit. Natural Language...
    • Lebai Lutfi, S., Fernández-Martínez, F., Lucas-Cuesta, J. M., López-Lebón, L., & Montero, J. M. (2013). A satisfactionbased model for...
    • Lemon, O. (2011). Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation....
    • Lemon, O., & Pietquin, O. (Eds.) (2012). Data-driven methods for adaptive spoken dialogue systems: Computational learning for conversational...
    • Levow, G.-A. (2012). Bridging gaps for spoken dialog system frameworks in instructional settings. Proceedings of NAACL– HLT Workshop on Future...
    • Longé, M., Eyraud, R., & Hullfish, K.C. (2012). Multimodal disambiguation of speech recognition. U.S. Patent No. 8095364 B2. Retrieved...
    • López, V., Eisman, E. M., Castro, J. L., & Zurita, J. M. (2012). A case based reasoning model for multilingual language generation in...
    • López-Cózar, & R., Araki, M. (2005). Spoken, multilingual and multimodal dialogue systems: Development and assessment. John Wiley.
    • Maglogiannis, I., Zafiropoulos, E., & Anagnostopoulos, I. (2009). An intelligent system for automated breast cancer diagnosis and prognosis...
    • Mairesse, F., & Walker, M. A. (2011). Controlling user perceptions of linguistic style: Trainable generation of personality traits. Computational...
    • McTear, M. F. (2002). Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34(1), 90–169. http://dx.doi.org/10.1145/505282.505285
    • McTear, M. F. (2004). Spoken dialogue technology. Toward the conversational user interface. Springer. http://dx.doi.org/10.1007/978-0-85729-414-2
    • McTear, M. F. (2011). Trends, challenges and opportunities in spoken dialogue research. In W. Minker, G. G. Lee, S. Nakamura, & J. Mariani...
    • McTear, M. F., & Callejas, Z. (2013). Voice application development for Android. Packt.
    • Melin, H., Sandell, A., & Ihse, M. (2001). CTT-bank: A speech controlled telephone banking system–An initial evaluation. TMHQPSR 42(1),...
    • Menezes, P., Lerasle, F., Dias, J., & Germa, T. (2007). Towards an interactive humanoid companion with visual tracking modalities. International...
    • Migneault, J. P., Farzanfar, R., Wright, J. A., & Friedman, R. H. (2006). How to write health dialog for a talking computer. Journal of...
    • Minker, W., Albalate, A., Buhler, D., Pittermann, A., Pittermann, J., Strauss, P.-M., & Zaykovskiy, D. (2006). Recent trends in spoken...
    • Minker, W., Haiber, U., Heisterkamp, P., & Scheible, S. (2004). The SENECA spoken language dialogue system. Speech Communication, 43(1–2),...
    • Möller, S., Engelbrecht, K.P., & Schleicher, R. (2008). Predicting the quality and usability of spoken dialogue services. Speech Communication,...
    • Möller, S., & Heusdens, R. (2013). Objective estimation of speech quality for communication systems. IEEE Transactions on Audio, Speech...
    • Moors, A., Ellsworth, P. C., Scherer, K. R., & Frijda, N. H. (2013). Appraisal theories of emotion: State of the art and future development....
    • Nass, C., & Yen, C. (2012). The man who lied to his laptop: What we can learn about ourselves from our machines. Current Trade.
    • Neustein, A., & Markowitz, J. A. (2013). Mobile speech and advanced natural language solutions (2013 ed.). New York: Springer. http://dx.doi.org/10.1007/978-1-4614-6018-3
    • O'Neill, I., Hanna, P., Liu, X., Greer, D., & McTear, M. F. (2005). Implementing advanced spoken dialogue management in Java. Science...
    • Os, E. den, Boves, L., Lamel, L, & Baggia, P. (1999). Overview of the ARISE project. Proceedings of the 6th European Conference on Speech...
    • Pfeifer, L. M., & Bickmore, T. (2010). Designing embodied conversational agents to conduct longitudinal health interviews. Proceedings...
    • Picard, R. W. (2003). Affective computing: Challenges. International Journal of Human–Computer Studies, 59(1–2), 55–64. http://dx.doi.org/10.1016/S1071-5819(03)00052-1
    • Pieraccini, R. (2012). The voice in the machine: Building computers that understand speech. Cambridge, MA: MIT Press.
    • Pieraccini, R., & Huerta, J. M. (2008). Where do we go from here? In L. Dybkjær & W. Minker (Eds.), Recent trends in discourse and...
    • Pon-Barry, H., Schultz, K., Bratt, E.O., Clark, B., & Peters, S. (2006). Responding to student uncertainty in spoken tutorial dialogue...
    • Qu, C., Brinkman, W.-P., Ling, Y., Wiggers, P., & Heynderickx, I. (2014). Conversations with a virtual human: Synthetic emotions and human...
    • Rabiner, L. R., & Huang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall. PMid:8430825
    • Ramelson, H. Z., Friedman, R. H., & Ockene, J. K. (1999). An automated telephone-based smoking cessation education and counseling system....
    • Rich, C., & Sidner, C. L. (1998). COLLAGEN: A collaboration manager for software interface agents. User Modeling and User- Adapted Interaction,...
    • Rieser, V., Lemon, O., & Keizer, S. (2014). Natural language generation as incremental planning under uncertainty: Adaptive information...
    • Roda, C., Angehrn, A., & Nabeth, T. (2001). Conversational agents for advanced learning: Applications and research. Proceedings of BotShow...
    • Rodríguez, W. R., Saz, O., & Lleida, E. (2012). A prelingual tool for the education of altered voices. Speech Communication, 54(5), 583–600....
    • Rothkrantz, L. J. M., Wiggers, P., Flippo, F., Woei-A-Jin, D., & van Vark, R. J. (2004). Multimodal dialogue management. Lecture Notes...
    • Russ, G., Sallans, B., & Hareter, H. (2005). Semantic based information fusion in a multimodal interface. Proceedings of the International...
    • Saz, O., Yin, S. C., Lleida, E., Rose, R., Vaquero, C., & Rodríguez, W. R. (2009). Tools and technologies for computer-aided speech and...
    • Schlangen, D., & Skantze, G. (2011). A general, abstract model of incremental dialogue processing. Dialogue & Discourse, 2(1), 83–111....
    • Schuller, B. W., & Batliner, A. (2013). Computational paralinguistics: Emotion, affect and personality in speech and language processing....
    • Sekmen, A., & Challa, P. (2013). Assessment of adaptive human– robot interactions. Knowledge-Based Systems, 42, 49–59. http://dx.doi.org/10.1016/j.knosys.2013.01.003
    • Seneff, S. (2002). Response planning and generation in the MERCURY flight reservation system. Computer Speech and Language, 16(3– 4), 283–312....
    • Stewart, J. Q. (1922). An electrical analogue of the vocal organs. Nature, 110, 311–312. http://dx.doi.org/10.1038/110311a0
    • Turing, A. (1950). Computing machinery and intelligence. Mind, 236, 433–460. http://dx.doi.org/10.1093/mind/LIX.236.433
    • Vipperla, R., Wolters, M., & Renals, S. (2012). Spoken dialogue interfaces for older people. In K. J. Turner (Ed.), Advances in home care...
    • Walker, M., Hindle, D., Fromer, J., Di Fabbrizio, G., & Mestel, C. (1997). Evaluating competing agent strategies for a voice email agent....
    • Walker, M. A., Litman, D. J., Kamm, C. A., & Abella, A. (1998). Evaluating spoken dialogue agents with PARADISE: Two case studies. Computer...
    • Wang, Z., & Lemon, O. (2013). A simple and generic belief tracking mechanism for the dialog state tracking challenge: On the believability...
    • Weizenbaum, J. (1966). ELIZA–A computer program for the study of natural language communication between man and machine. Communications of...
    • Wilks, Y., Catizone, R., Worgan, S., & Turunen, M. (2011). Some background on dialogue management and conversational speech for dialogue...
    • Williams, J. D., Yu, K., Chaib-draa, B., Lemon, O., Pieraccini, R., Pietquin, O., … & Young, S. (2012). Introduction to the issue on advances...
    • Young, S., Gasic, M., Thomson, B., & Williams, J. D. (2013). POMDP-based statistical spoken dialog systems: A review. Proceedings of the...
    • Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T.S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions....
    • Zhu, C., Sheng, W. (2011). Motion- and location-based online human daily activity recognition. Pervasive and Mobile Computing, 7(2), 256–269....
    • Zue, V., Seneff, S., Glass, J. R., Polifroni, J., Pao, C., Hazen, T. J., & Hetherington, L. (2000). JUPITER: A telephone-based conversational...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno