Ir al contenido

Documat


Improving the classification of cybersecurity attack procedures using retrieval augmented generation

  • Autores: Sonia Bilbao Arechavala, Aitziber Atutxa Salazar Árbol académico, Javier del Ser Lorente Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 199-210
  • Idioma: inglés
  • Títulos paralelos:
    • Clasificación de procedimientos de ataques de ciberseguridad mediante generación aumentada por recuperación
  • Enlaces
  • Resumen
    • español

      Understanding the tactics (why), techniques (how) and procedures (methods) behind a cybersecurity attack is paramount to develop defenses against them or to mitigate their effects. However, this task requires a high-level of technical expertise, is time-consuming and error prone. In this work we verify that open-source Llama 3.1 LLMs (Large Language Models) cannot automatically identify which of the 625 MITRE techniques is used within a cybersecurity attack procedure. We evaluate two RAG (Retrieval Augmented Generation) approaches to enhance the classification accuracy. Our experiments show the importance of the embedding model in information retrieval. Moreover, our analysis shows that selecting appropriate examples helps the language model reduce ambiguity. Specifically, a dynamic few-shot learning strategy performs best for larger models, whereas a multiple-choice strategy is more appropriate for smaller models. In contrast, corrective RAG techniques fail to provide significant enhancements, highlighting current methodological limitations and the inherent complexity of this task.

    • español

      El habla de las personas con discapacidad intelectual (DI) plantea enormes retos a los sistemas de reconocimiento automático del habla (ASR), dificultando con ello el acceso de una población especialmente sensible a los servicios de información. En este trabajo se estudian las dificultades de los sistemas ASR para reconocer habla de personas DI y se muestra cómo esta limitación puede ser combatida con estrategias de ajuste fino de modelos. Se mide el rendimiento de ASR basado en whisper (v2 y v3) con un corpus de referencia de habla típica y habla DI, comprobando que hay diferencias importantes y significativas. Aplicando técnicas de fine-tuning, el rendimiento para hablantes DI mejora en al menos 30 puntos porcentuales. Nuestros resultados muestran que la inclusión de voz de personas DI en los corpus de entrenamiento es fundamental para mejorar la eficacia de los ASR.

  • Referencias bibliográficas
    • Al-Sada, B., A. Sadighian, and G. Oligeri. 2024. MITRE ATT&CK: State of the art and way forward. ACM Computing Surveys, 57(1).
    • Alves, P. M. M. R., G. P. R. Filho, and V. P. Gon¸calves. 2022. Leveraging BERT’s power to classify TTP from unstructured text. In 2022 Workshop...
    • Barnett, S., S. Kurniawan, S. Thudumu, Z. Brannelly, and M. Abdelrazek. 2024. Seven failure points when engineering a retrieval augmented...
    • Beltagy, I., K. Lo, and A. Cohan. 2019. SciBERT: A pretrained language model for scientific text. In Conference on Empirical Methods in Natural...
    • Chen, J., S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. 2024. BGE M3-embedding: Multi-lingual, multi-functionality, multigranularity text...
    • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding....
    • Dong, Q. et al. 2024. A survey on in-context learning. In 2024 Conference on Empirical Methods in Natural Language Processing, pages 1107–1128.
    • Fayyazi, R., R. Taghdimi, and S. J. Yang. 2024. Advancing TTP analysis: harnessing the power of large language models with retrieval augmented...
    • Fayyazi, R. and S. J. Yang. 2023. On the uses of large language models to interpret ambiguous cyberattack descriptions. arXiv preprint arXiv:2306.14062.
    • Formal, T., C. Lassance, B. Piwowarski, and S. Clinchant. 2021. SPLADE v2: Sparse lexical and expansion model for information retrieval. arXiv...
    • Gao, Y. et al. 2024. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
    • Grattafiori, A. et al. 2024. The Llama 3 herd of models. arXiv preprint arXiv:2407.21783.
    • Guo, D. et al. 2025. DeepSeek-R1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948.
    • Lassance, C., H. D´ejean, T. Formal, and S. Clinchant. 2024. SPLADE-v3: New baselines for SPLADE. arXiv preprint arXiv:2403.06789.
    • Li, L., C. Huang, and J. Chen. 2024. Automated discovery and mapping ATT&CK tactics and techniques for unstructured cyber threat intelligence....
    • Lin, C.-Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
    • Papineni, K., S. Roukos, T. Ward, and W.-J. Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In 40th Annual Meeting...
    • Reimers, N. and I. Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-Networks. In 2019 Conference on Empirical Methods...
    • Salemi, A. and H. Zamani. 2024. Evaluating retrieval quality in retrieval-augmented generation. In 47th International ACM SIGIR Conference...
    • Santhanam, K., O. Khattab, J. Saad-Falcon, C. Potts, and M. Zaharia. 2022. Col-BERTv2: Effective and efficient retrieval via lightweight late...
    • Sauerwein, C. and A. Pfohl. 2022. Towards automated classification of attackers’ TTPs by combining NLP with ML techniques. arXiv preprint...
    • Tihanyi, N., M. A. Ferrag, R. Jain, T. Bisztray, and M. Debbah. 2024. Cyber-Metric: a benchmark dataset based on retrieval-augmented generation...
    • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances...
    • Wan, S. et al. 2024. CYBERSECEVAL 3: Advancing the evaluation of cybersecurity risks and capabilities in large language models. arXiv preprint...
    • Yan, S.-Q., J.-C. Gu, Y. Zhu, and Z.-H. Ling. 2024. Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884.
    • Zhang, T., V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. 2020. BERTScore: Evaluating text generation with BERT. In International Conference...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno