Mejoras en extracción de URLs en smishing mediante text spotting

Pablo Blanco Medina; Rubel Biswas; Víctor González Castro; Rocío Aláiz Rodríguez; Eduardo Fidalgo Fernández; Enrique Alegre Gutiérrez

Ayuda

Mejoras en extracción de URLs en smishing mediante text spotting

Blanco Medina, Pablo ^[1] ; Biswas, Rubel ^[1] ; González Castro, Victor ^[1] ; Alaiz Rodríguez, Rocío ^[1] ; Fidalgo, Eduardo ^[1] ; Alegre, Enrique ^[1]
1. [1] Universidad de León
  
  Universidad de León
  
  León, España
Localización: Jornadas de Automática, ISSN-e 3045-4093, Nº. 45, 2024
Idioma: español
DOI: 10.17979/ja-cea.2024.45.10954
Títulos paralelos:
- Enhancing Smishing URL Extraction with Text Spotting
Enlaces
- Texto completo 1 2
Resumen
- español
  Los Equipos de Respuesta ante Emergencias Informáticas (CERT) reciben comúnmente capturas de pantalla de Smishing, que tratan de suplantar a distintos tipos de organizaciones, con el objetivo de apropiarse de información personal de usuarios o malversar fondos de sus cuentas mediante enlaces maliciosos. Los CERTs buscan soluciones automatizadas que permitan recuperar URLs de capturas de pantalla. Para extraer texto pueden utilizarse métodos basados en el reconocimiento óptico de caracteres (OCR), pero su rendimiento es bajo debido a problemas como la baja calidad de la imagen o textos divididos en múltiples frases. Proponemos un proceso para la extracción de URL de Smishing basado en técnicas de Text Spotting, complementado con una reconstrucción de URL personalizada utilizando características resaltadas en la imagen. Aplicamos la metodología propuesta a un conjunto personalizado de 244 capturas y 262 URLs, obteniendo como resultado un aumento de la precisión de reconocimiento de 3,05% a 22,90%, tras lo cual puede continuarse procesando el texto extraído en Smishing.
- English
  Computer Emergency Response Teams (CERTs) often get screenshots showcasing brief texts with doubtful content. Smishing attempts to mimic reputable organizations, urging individuals to act promptly by clicking on a link, aiming to hijack personal information or illicitly debit funds from their accounts. CERTs may find value in automated solutions that can retrieve URLs from screenshots for subsequent validation. Approaches based on Optical Character Recognizers (OCRs) could be used to extract text. However, their performance is low due to the poor performance of OCR in certain images. In this work, we propose a pipeline for Smishing URL extraction based on Text Spotting, which will later be applied to a custom URL reconstruction using highlighted features. We applied the proposed pipeline to a custom set of 117 screenshots containing 121 URLs, resulting in aprecision increase on the URL recovery task from 3,05 % to 22,90 %. This allows the original URL to be restored for subsequent processing in the analysis o fSmishing messages.
Referencias bibliográficas
- Al-Qahtani, A. F., Cresci, S., 2022. The covid-19 scamdemic: A survey of phishing attacks and their countermeasures during covid-19. IET Information...
- Baek, J., Matsui, Y., Aizawa, K., 2021. What if we only use real datasets for scene text recognition? toward scene text recognition with fewer...
- Bautista, D., Atienza, R., 2022. Scene text recognition with permuted autoregressive sequence models. In: European conference on computer...
- Blanco-Medina, P., Fidalgo, E., Alegre, E., Gonzalez-Castro, V., 2022. A survey on methods, datasets and implementations for scene text spotting....
- Church, K., De Oliveira, R., 2013. What’s up with whatsapp? comparing mobile instant messaging behaviors with traditional sms. In: Proceedings...
- Jánez-Martino, F., Alaiz-Rodríguez, R., Gonzalez-Castro, V., Fidalgo, E., Alegre, E., 2023. A review of spam email detection: analysis of...
- Joshi, A., Fidalgo, E., Alegre, E., Fernandez-Robles, L., 2023. Deepsumm: Exploiting topic models and sequence to sequence networks for extractive...
- Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V. R., Lu, S.,...
- Maneriker, P., Stokes, J. W., Lazo, E. G., Carutasu, D., Tajaddodianfar, F., Gururajan, A., 2021. Urltran: Improving phishing url detection...
- Mishra, S., Soni, D., 2022. Sms phishing dataset for machine learning andpattern recognition. In: International Conference on Soft Computing...
- Rahman, M. L., Timko, D., Wali, H., Neupane, A., 2023. Users really do respond to smishing. In: Proceedings of the Thirteenth ACM Conference...
- Sanchez-Paniagua, M., Fern ´ andez, E. F., Alegre, E., Al-Nabki, W., González-Castro, V., 2022. Phishing url detection: A real-case scenario...
- Timko, D., Rahman, M. L., 2023. Commercial anti-smishing tools and their comparative effectiveness against modern threats. In: Proceedings...
- Ulfath, R. E., Sarker, I. H., Chowdhury, M. J. M., Hammoudeh, M., 2022. Detecting smishing attacks using feature extraction and classification...
- Vadrevu, P., Liu, J., Li, B., Rahbarinia, B., Lee, K. H., Perdisci, R., 2017. Enabling reconstruction of attacks on users via efficient browsing...
- Wang, W., Xie, E., Li, X., Liu, X., Liang, D., Yang, Z., Lu, T., Shen, C., 2021. Pan++: Towards efficient and accurate end-to-end...