Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering

Iñaki Vélez de Mendizabal; Xabier Vidriales; Vitor Manuel Basto Fernandes; Enaitz Ezpeleta Gallastegui; José R. Ménde; Urko Zurutuza Ortega

Ayuda

Deobfuscating Leetspeak With Deep Learning to Improve Spam Filtering

Iñaki Vélez de Mendizabal ^[2] ; Xabier Vidriales ^[3] ; Vitor Basto-Fernandes ^[1] ; Enaitz Ezpeleta ^[2] ; José R. Ménde ^[1] ; Urko Zurutuza ^[2]
1. [1] Universidade de Vigo
  
  Universidade de Vigo
  
  Vigo, España
2. [2] Mondragon Unibersitatea
3. [3] University Institute of Lisbon
Mostrar afiliaciones +
Localización: IJIMAI, ISSN-e 1989-1660, Vol. 8, Nº. 4, 2023, págs. 46-55
Idioma: inglés
DOI: 10.9781/ijimai.2023.07.003
Enlaces
- Texto completo (pdf)

Dialnet Métricas: 1 Cita

Resumen
- The evolution of anti-spam filters has forced spammers to make greater efforts to bypass filters in order to distribute content over networks. The distribution of content encoded in images or the use of Leetspeak are concrete and clear examples of techniques currently used to bypass filters. Despite the importance of dealing with these problems, the number of studies to solve them is quite small, and the reported performance is very limited. This study reviews the work done so far (very rudimentary) for Leetspeak deobfuscation and proposes a new technique based on using neural networks for decoding purposes. In addition, we distribute an image database specifically created for training Leetspeak decoding models. We have also created and made available four different corpora to analyse the performance of Leetspeak decoding schemes. Using these corpora, we have experimentally evaluated our neural network approach for decoding Leetspeak. The results obtained have shown the usefulness of the proposed model for addressing the deobfuscation of Leetspeak character sequences.
Referencias bibliográficas
- M. Chakraborty, S. Pal, R. Pramanik, and C. Ravindranath Chowdary, “Recent developments in social spam detection and combating techniques:...
- S. Suryawanshi, A. Goswami, and P. Patil, “Email Spam Detection: An Empirical Comparative Study of Different ML and Ensemble Classifiers,”...
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of the 1st International...
- Y. Cabrera-León, P. García Báez, and C. P. Suárez-Araujo, “Non-email spam and machine learning-based anti-spam filters: Trends and some remarks,”...
- Z. Liu, W. Lin, N. Li, and D. Lee, “Detecting and filtering instant messaging spam - a global and personalized approach,” in Proceedings of...
- C. Manning, P. Raghavan, and H. Schütze, “Introduction to information retrieval,” Natural Language Engineering, vol. 16, no. 1, pp. 100–103,...
- E. Alpaydin, Introduction to machine learning. Cambridge, Massachusetts: MIT press, 2020.
- J. Hovold, “Naive Bayes Spam Filtering Using Word-Position-Based Attributes,” presented at the Second Conference on Email and Anti-Spam CEAS-2005,...
- V. Metsis, I. Androutsopoulos, and G. Paliouras, “Spam filtering with naive bayes-which naive bayes?,” in Proceedings of the 3rd Conference...
- I. Androutsopoulos, G. Paliouras, V. Karkaletsis, G. Sakkis, C. D. Spyropoulos, and P. Stamatopoulos, “Learning to filter spam e-mail: A comparison...
- S. Goyal, R. Chauhan, and S. Parveen, “Spam detection using KNN and decision tree mechanism in social network,” in Proceedings of the 4th...
- S. K. Trivedi and P. K. Panigrahi, “Spam classification: a comparative analysis of different boosted decision tree approaches,” Journal of...
- Q. Wang, Y. Guan, and X. Wang, “SVM-Based Spam Filter with Active and Online Learning,” in Proceedings of the 15th Text REtrieval Conference,...
- J. Clark, I. Koprinska, and J. Poon, “A neural network based approach to automated e-mail classification,” in Proceedings International Conference...
- J. Goodman and W. Yih, “Online Discriminative Spam Filter Training,” in Proceedings of the 3rd Conference on Email and Anti-Spam, Mountain...
- T. Oda and T. White, “Increasing the accuracy of a spam-detecting artificial immune system,” in Proceedings of the 2003 Congress on Evolutionary...
- X. Carreras and L. Marquez, “Boosting trees for anti-spam email filtering.” arXiv cs/0109015, 2001. [Online]. Available: https://arxiv.org/abs/cs/0109015
- C. Fellbaum, “WordNet,” in The Encyclopedia of Applied Linguistics, C. Chapelle, Ed. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2012,...
- R. Navigli and S. P. Ponzetto, “BabelNet: Building a very large multilingual semantic network,” in Proceedings of the 48th annual meeting...
- J. R. Méndez, T. R. Cotos-Yañez, and D. Ruano-Ordás, “A new semanticbased feature selection method for spam filtering,” Applied Soft Computing,...
- E. M. Bahgat and I. F. Moawad, “Semantic-Based Feature Reduction Approach for E-mail Classification,” in Proceedings of the 2nd International...
- I. Vélez de Mendizabal, V. Basto-Fernandes, E. Ezpeleta, J. R. Méndez, and U. Zurutuza, “SDRS: A new lossless dimensionality reduction for...
- M. Dredze, R. Gevaryahu, and A. Elias-Bachrach, “Learning Fast Classifiers for Image Spam,” in Proceedings of the 3rd Conference on Email...
- B. Biggio, G. Fumera, I. Pillai, F. Roli, and R. Satta, “Evading SpamAssassin with obfuscated text images,” 2007. https://www.virusbulletin.com/virusbulletin/2007/11/evading-spamassassin-obfuscated-text-images/...
- J. Evershed and K. Fitch, “Correcting noisy OCR: Context beats confusion,” in Proceedings of the First International Conference on Digital...
- E. Bursztein, M. Martin, and J. Mitchell, “Text-based CAPTCHA strengths and weaknesses,” in Proceedings of the 18th ACM conference on Computer...
- J. Wang, J. Qin, X. Xiang, Y. Tan, N. Pan, and College of Computer Science and Information Technology, Central South University of Forestry...
- F.-L. Du, J.-X. Li, Z. Yang, P. Chen, B. Wang, and J. Zhang, “CAPTCHA Recognition Based on Faster R-CNN,” in Proceedings of the 13th International...
- E. Flamand, “Deciphering L33t5p34k Internet Slang on Message Boards,” Diss. Ghent University, 2008. [Online]. Available: https://lib.ugent.be/en/catalog/rug01:001414289
- J. A. Zdziarski, Ending spam: Bayesian content filtering and the art of statistical language classification. San Francisco, California: No...
- A. Tundis, G. Mukherjee, and M. Mühlhäuser, “Mixed-code text analysis for the detection of online hidden propaganda,” in Proceedings of the...
- F. K. Dosilovic, M. Brcic, and N. Hlupic, “Explainable artificial intelligence: A survey,” in Proceedings of the 41st International Convention...
- A. Tundis, G. Mukherjee, and M. Mühlhäuser, “An Algorithm for the Detection of Hidden Propaganda in Mixed-Code Text over the Internet,” Applied...
- T. E. de Campos, B. R. Babu, and M. Varma, “Character recognition in natural images,” in Proceedings of the 4th International Conference on...
- M. Deore and U. Kulkarni, “MDFRCNN: Malware Detection using Faster Region Proposals Convolution Neural Network,” International Journal of...
- A. Bhaik, V. Singh, E. Gandotra, and D. Gupta, “Detection of Improperly Worn Face Masks using Deep Learning – A Preventive Measure Against...
- A. Jan and G. M. Khan, “Real World Anomalous Scene Detection and Classification using Multilayer Deep Neural Networks,” International Journal...
- A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, “Deep learning for computer vision: A brief review,” Computational intelligence...
- A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep Learning Advances in Computer Vision with 3D Data: A Survey,” ACM...
- M. Abadi et al., “Tensorflow: A system for large-scale machine learning,” in Proceedings of the 12th USENIX conference on Operating Systems...
- A. Gulli and S. Pal, Deep learning with Keras. Birmingham, UK: Packt Publishing Ltd, 2017.
- I. Vélez de Mendizabal, X. Vidriales, V. B. Fernandes, E. Ezpeleta, J. R. Méndez, and U. Zurutuza, “Image dataset to train a deep learning...
- E. Ezpeleta, M. Iturbe, I. Garitano, I. V. de Mendizabal, and U. Zurutuza, “A mood analysis on youtube comments and a method for improved...
- I. Vélez de Mendizabal, X. Vidriales, V. B. Fernandes, E. Ezpeleta, J. R. Méndez, and U. Zurutuza, “Set of obfuscated spam dataset by using...