Evaluación asistida por inteligencia artificial generativa en prácticas de Ingeniería de Software: una prueba de concepto

Francisco José García Peñalvo; Marc Alier Forment; Andrea Vázquez Ingelmo; Alicia García Holgado; María José Casany; Juanan Pereira Varela

Ayuda

Evaluación asistida por inteligencia artificial generativa en prácticas de Ingeniería de Software: una prueba de concepto

Francisco José García-Peñalvo ^[1] ; Marc Alier-Forment ^[2] ; Andrea Vázquez-Ingelmo ^[1] ; Alicia García-Holgado ^[1] ; María José Casañ-Guerrero ^[2] ; Juanan Pereira ^[3]
1. [1] Universidad de Salamanca
  
  Universidad de Salamanca
  
  Salamanca, España
2. [2] Universitat Politècnica de Catalunya
  
  Universitat Politècnica de Catalunya
  
  Barcelona, España
3. [3] Universidad del País Vasco/Euskal Herriko Unibertsitatea
  
  Universidad del País Vasco/Euskal Herriko Unibertsitatea
  
  Leioa, España
Mostrar afiliaciones +
Localización: RIED: revista iberoamericana de educación a distancia, ISSN 1138-2783, Vol. 29, Nº 2, 2026 (Ejemplar dedicado a: La universidad ante la IA generativa: Veracidad, docencia y evaluación)
Idioma: español
Títulos paralelos:
- Generative artificial intelligence-assisted assessment in Software Engineering assignments:: A proof of concept
Enlaces
- Texto completo
Resumen
- español
  La irrupción de la inteligencia artificial generativa (GenAI) está transformando la evaluación en la educación superior y plantea desafíos específicos en asignaturas técnicas con proyectos complejos. Este trabajo presenta un asistente de evaluación basado en GenAI como prueba de concepto aplicado al hito de proyecto de la asignatura Ingeniería del Software I (Grado en Ingeniería Informática, Universidad de Salamanca). El sistema se despliega en infraestructura local y combina un pipeline multimodal para procesar memorias en PDF (incluyendo texto y diagramas de casos de uso) con un flujo de prompts alineado con la rúbrica de la asignatura. A partir de los documentos, el asistente extrae objetivos, requisitos y casos de uso; analiza su coherencia (trazabilidad e integridad); aplica la rúbrica para asignar calificaciones; y genera un informe cuantitativo y cualitativo para cada grupo. El estudio compara las notas propuestas por la IA con las otorgadas por el profesorado en 14 entregas del curso 2023-2024. Los resultados muestran una tendencia sistemática de la IA a calificar aproximadamente 1 punto por debajo de la media humana, con convergencia en algunos criterios (p. ej., requisitos no funcionales) y divergencia en otros (objetivos, casos de uso, matrices de trazabilidad), donde la IA aplica una lógica cercana a estándares profesionales. Estas diferencias resultan complementarias al juicio docente y permiten una evaluación más rica y transparente. Se discuten implicaciones éticas y pedagógicas del enfoque y se proponen líneas de trabajo futuro centradas en el refinamiento de los prompts y en la evaluación prospectiva en nuevas cohortes.
- English
  The emergence of generative artificial intelligence (GenAI) is transforming assessment in higher education and poses specific challenges in technical subjects with complex projects. This work presents a proof-of-concept GenAI-based assessment assistant applied to the milestone of the project in the Software Engineering I subject (Bachelor’s Degree in Computer Engineering, University of Salamanca). The system is deployed on local infrastructure and combines a multimodal pipeline to process PDF reports (including text and use case diagrams) with a flow of prompts aligned with the subject’s rubric. Based on the generated documents, the assistant extracts objectives, requirements, and use cases; analyses their coherence (traceability and completeness); applies the rubric to assign ratings to each criterion; and synthesizes a final quantitative and qualitative report for each group. The study compares the marks proposed by the AI with those awarded by the teaching staff in 14 assignments from the 2023-2024 academic year. The results show a systematic tendency for the AI to mark approximately 1 point below the human average, with high convergence in some criteria (e.g., well-defined non-functional requirements) and divergence in others (e.g., objectives, use cases, traceability matrices), where the AI applies a logic closer to professional standards. Far from being interpreted as a failure, these differences reveal themselves as complementary to teaching judgment and allow for richer, more transparent, formative assessment. The ethical and pedagogical implications of the approach are discussed, and future work focuses on refining prompts and evaluating the system in new cohorts.
Referencias bibliográficas
- Alier-Forment, M., García-Peñalvo, F. J., Casañ-Guerrero, M. J., Pereira, J. A. y Llorens-Largo, F. (2024, October 8). Safe AI in Education...
- Alier-Forment, M., Casañ-Guerrero, M. J., Pereira, J., García-Peñalvo, F. J. y Llorens-Largo, F. (2026). Generative artificial intelligence...
- Alsharefeen, R. y Al Sayari, N. (2025). Examining academic integrity policy and practice in the era of AI: A case study of faculty perspectives....
- Ashford-Rowe, K., Herrington, J. y Brown, C. (2014). Establishing the critical elements that determine authentic assessment. Assessment &...
- Baig, M. I. y Yadegaridehkordi, E. (2025). Factors influencing academic staff satisfaction and continuous usage of generative artificial intelligence...
- Bittle, K. y El-Gayar, O. (2025). Generative AI and academic integrity in higher education: A systematic review and research agenda. Information,...
- Bond, M., Khosravi, H., De Laat, M., Bergdahl, N., Negrea, V., Oxley, E., Pham, P., Chong, S. W. y Siemens, G. (2024). A meta systematic review...
- Chakraborty, T. y Masud, S. (2024). The Promethean dilemma of AI at the intersection of hallucination and creativity. Communications of the...
- Chaudhry, M. A. y Kazim, E. (2022). Artificial intelligence in education (AIEd): A high-level academic and industry note 2021. AI and Ethics,...
- Conde, J., López-Pernas, S., Pozo, A., Munoz-Arcentales, A., Huecas, G. y Alonso, Á. (2021). Bridging the gap between academia and industry...
- Daun, M., Salmon, A., Weyer, T., Pohl, K. y Tenbergen, B. (2016). Project-based learning with examples from industry in university courses:...
- EDSAFE AI. (2021). SAFE Benchmarks Framework. EDSAFE AI. https://www.edsafeai.org/safe
- European Parliament & Council of the European Union. (2016). Regulation (EU) 2016/679 (General Data Protection Regulation). EUR-Lex. https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng
- European Parliament & Council of the European Union. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence Act). EUR-Lex. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
- García-Holgado, A., García-Peñalvo, F. J. y Rodríguez-Conde, M. J. (2018). Pilot experience applying an active learning methodology in a software...
- García-Peñalvo, F. J. (2024). Generative artificial intelligence and education: An analysis from multiple perspectives. Education in the Knowledge...
- García-Peñalvo, F. J. (2025). Three scenarios for AI in education: From responsible assistance to co-creation. Education in the Knowledge...
- García-Peñalvo, F. J., Alier, M., Pereira, J. y Casañ, M. J. (2024). Safe, transparent, and ethical artificial intelligence: Keys to quality...
- García-Peñalvo, F. J., Casañ-Guerrero, M. J., Alier-Forment, M. y Pereira, J. A. (2025). The ethics of generative artificial intelligence...
- García-Peñalvo, F. J., García-Holgado, A., Vázquez-Ingelmo, A. y Sánchez-Prieto, J. C. (2021). Planning, communication and active methodologies:...
- García-Peñalvo, F. J., Llorens-Largo, F. y Vidal, J. (2024). The new reality of education in the face of advances in generative artificial...
- García-Peñalvo, F. J. y Vázquez-Ingelmo, A. (2023). What do we mean by GenAI? A systematic mapping of the evolution, trends, and techniques...
- Gulikers, J. T. M., Bastiaens, T. J. y Kirschner, P. A. (2004). A five-dimensional framework for authentic assessment. Educational Technology...
- Ilieva, G., Yankova, T., Ruseva, M. y Kabaivanov, S. (2025). A framework for generative AI-driven assessment in higher education. Information,...
- ISO/IEC. (2023). ISO/IEC 25010:2023 Systems and software engineering — Systems and software quality requirements and evaluation (SQuaRE) —...
- Jacobson, I., Booch, G. y Rumbaugh, J. (1999). The unified software development process. Addison-Wesley.
- Karkoulian, S., Sayegh, N. y Sayegh, N. (2025). ChatGPT unveiled: Understanding perceptions of academic integrity in higher education—A qualitative...
- Khlaif, Z. N., Ayyoub, A., Hamamra, B., Bensalem, E., Mitwally, M. A. A., Ayyoub, A., Hattab, M. K. y Shadid, F. (2024). University teachers’...
- Leaton Gray, S., Edsall, D. y Parapadakis, D. (2025). AI-based digital cheating at university, and the case for new ethical pedagogies. Journal...
- Mulaudzi, L. V. y Hamilton, J. (2025). Lecturer’s perspective on the role of AI in personalized learning: Benefits, challenges, and ethical...
- Mustafa, M. Y., Tlili, A., Lampropoulos, G., Huang, R., Jandrić, P., Zhao, J., Salha, S., Xu, L., Panda, S., Kinshuk, López-Pernas, S. y Saqr,...
- Navío-Inglés, M., Guzmán Mora, J., O’Connor-Jiménez, P. y García González, A. (2025). What’s next for feedback in writing instruction? Pre-service...
- Qian, Y. (2025). Pedagogical applications of generative AI in higher education: A systematic review of the field. TechTrends, 69(5), 1105-1120....
- Rumbaugh, J., Jacobson, I. y Booch, G. (2005). The unified modeling language reference manual (2nd ed.). Addison-Wesley.
- Sadasivan, V. S., Kumar, A., Balasubramanian, S., Wang, W. y Feizi, S. (2023). Can AI-generated text be reliably detected? arXiv. https://doi.org/10.48550/arXiv.2303.11156
- Schneider, G. y Winters, J. P. (2001). Applying use cases: A practical guide (2nd ed.). Addison-Wesley.
- Selwyn, N. (2022). Education and technology: Key issues and debates (3rd ed.). Bloomsbury Academic. https://doi.org/10.5040/9781350145573
- Sharma, R. C. y Panja, S. K. (2025). Addressing academic dishonesty in higher education: A systematic review of generative AI’s impact. Open...
- Tyton Partners. (2025). Time for Class 2025: Empowering educators, engaging students. Tyton Partners. https://4213961.fs1.hubspotusercontent-na1.net/hubfs/4213961/Publications/Time%20for%20Class/Tyton%20Partners_Time%20for%20Class%202025.pdf
- UNESCO. (2023). Guidance for generative AI in education and research. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000386693
- Vlachopoulos, D. y Makri, A. (2024). A systematic literature review on authentic assessment in higher education: Best practices for the development...
- Weber-Wulff, D., Anohina-Naumeca, A., Bjelobaba, S., Foltýnek, T., Guerrero-Dib, J., Popoola, O., Šigut, P. y Waddington, L. (2023). Testing...
- Xia, Q., Weng, X., Ouyang, F., Lin, T. J. y Chiu, T. K. F. (2024). A scoping review on how generative artificial intelligence transforms assessment...
- Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z.,...