Violence Detection in Audio: Evaluating the Effectiveness of Deep Learning Models and Data Augmentation

Dalila Durães; Bruno Veloso; Paulo Novais

Ayuda

Violence Detection in Audio: Evaluating the Effectiveness of Deep Learning Models and Data Augmentation

Dalila Durães ^[1] ; Bruno Veloso ^[1] ; Paulo Novais ^[1]
1. [1] Universidade do Minho
  
  Universidade do Minho
  
  Braga (São José de São Lázaro), Portugal
Localización: IJIMAI, ISSN-e 1989-1660, Vol. 8, Nº. 3, 2023, págs. 72-84
Idioma: inglés
DOI: 10.9781/ijimai.2023.08.007
Enlaces
- Texto completo (pdf)
Resumen
- Human nature is inherently intertwined with violence, impacting the lives of numerous individuals. Various forms of violence pervade our society, with physical violence being the most prevalent in our daily lives. The study of human actions has gained significant attention in recent years, with audio (captured by microphones) and video (captured by cameras) being the primary means to record instances of violence. While video requires substantial processing capacity and hardware-software performance, audio presents itself as a viable alternative, offering several advantages beyond these technical considerations. Therefore, it is crucial to represent audio data in a manner conducive to accurate classification. In the context of violence in a car, specific datasets dedicated to this domain are not readily available. As a result, we had to create a custom dataset tailored to this particular scenario. The purpose of curating this dataset was to assess whether it could enhance the detection of violence in car-related situations. Due to the imbalanced nature of the dataset, data augmentation techniques were implemented. Existing literature reveals that Deep Learning (DL) algorithms can effectively classify audio, with a commonly used approach involving the conversion of audio into a mel spectrogram image. Based on the results obtained for that dataset, the EfficientNetB1 neural network demonstrated the highest accuracy (95.06%) in detecting violence in audios, closely followed by EfficientNetB0 (94.19%). Conversely, MobileNetV2 proved to be less capable in classifying instances of violence.
Referencias bibliográficas
- [1] S. Koritsas, M. Boyle, J. Coles, “Factors associated with workplace violence in paramedics,” Prehospital and disaster medicine, vol. 24,...
- [2] W. So, “Perceived and actual leading causes of death through interpersonal violence in south korea as of 2018,” 2019.
- [3] APAV, “Estatísticas apav -relatório anual 2020.” https://apav.pt/apav_v3/ images/pdf/ Estatisticas_APAV_Relatorio_Anual_2020.pdf, 2021....
- [4] D. Durães, F. Santos, F. S. Marcondes, S. Lange, J. Machado, “Comparison of transfer learning behaviour in violence detection with different...
- [5] D. Durães, F. S. Marcondes, F. Gonçalves, J. Fonseca, J. Machado, P. Novais, “Detection violent behaviors: a survey,” in Ambient Intelligence– Software...
- [6] A. Jan, G. M. Khan, “Real world anomalous scene detection and classification using multilayer deep neural networks,” International Journal...
- [7] F. Santos, D. Durães, F. S. Marcondes, N. Hammerschmidt, S. Lange, J. Machado, P. Novais, “In-car violence detection based on the audio...
- [8] S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, et al.,...
- [9] M. Crocco, M. Cristani, A. Trucco, V. Murino, “Audio surveillance: A systematic review,” ACM Computing Surveys (CSUR), vol. 48, no. 4,...
- [10] D. M. Beltrán-Flores, “Ópera nacionalista ecuatoriana,” Master’s thesis, 2022.
- [11] K. Gkountakos, K. Ioannidis, T. Tsikrika, S. Vrochidis, I. Kompatsiaris, “Crowd violence detection from video footage,” in 2021 international conference...
- [12] T. Senst, V. Eiselein, A. Kuhn, T. Sikora, “Crowd violence detection using global motion-compensated lagrangian features and scale-sensitive video-level...
- [13] K. Gkountakos, K. Ioannidis, T. Tsikrika, S. Vrochidis, I. Kompatsiaris, “A crowd analysis framework for detecting violence scenes,”...
- [14] T. Hassner, Y. Itcher, O. Kliper-Gross, “Violent flows: Real-time detection of violent crowd behavior,” in 2012 IEEE computer society...
- [15] M. Sharma, T. Gupta, K. Qiu, X. Hao, R. Hamid, “Cnn- based audio event recognition for automated violence classification and rating for...
- [16] A. J. Naik, M. Gopalakrishna, “Violence detection in surveillancevideo-a survey,” International Journal of Latest Research in Engineering...
- [17] A. M. Yildiz, P. D. Barua, S. Dogan, M. Baygin, T. Tuncer, C. P. Ooi, H. Fujita, U. R. Acharya, “A novel tree pattern-based violence...
- [18] D. Duraes, F. Santos, F. S. Marcondes, N. Hammerschmidt, P. Novais, “Applying multisensor in-car situations to detect violence,” Expert Systems,...
- [19] V. S. Saravanarajan, R.-C. Chen, C. Dewi, L.-S. Chen, L. Ganesan, “Car crash detection using ensemble deep learning,” Multimedia Tools...
- [20] F. Reinolds, C. Neto, J. Machado, “Deep learning for activity recognition using audio and video,” Electronics, vol. 11, no. 5, p. 782,...
- [21] I. Goodfellow, Y. Bengio, A. Courville, Deep learning. MIT press, 2016.
- [22] B. Peixoto, B. Lavi, P. Bestagini, Z. Dias, A. Rocha, “Multimodal violence detection in videos,” in ICASSP 2020-2020 IEEE International...
- [23] A. S. Arukgoda, Improving sinhala–tamil translation through deep learning techniques. PhD dissertation, 2021.
- [24] A. Uçar, Y. Demir, C. Güzeliş, “Object recognition and detection with deep learning for autonomous driving applications,” Simulation,...
- [25] Y. Cho, N. Bianchi-Berthouze, S. J. Julier, “Deepbreath: Deep learning of breathing patterns for automatic stress recognition using low-cost thermal...
- [26] B. Veloso, D. Durães, P. Novais, “Analysis of machine learning algorithms for violence detection in audio,” in Highlights in Practical...
- [27] H. Souto, R. Mello, A. Furtado, “An acoustic scene classification approach involving domestic violence using machine learning,” in Anais...
- [28] H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, T. Sainath, “Deep learning for audio signal processing,” IEEE Journal of Selected...
- [29] J.-L. Rouas, J. Louradour, S. Ambellouis, “Audio events detection in public transport vehicle,” in 2006 IEEE Intelligent Transportation...
- [30] J. F. Gaviria, A. Escalante-Perez, J. C. Castiblanco, N. Vergara, V. ParraGarces, J. D. Serrano, A. F. Zambrano, L. F. Giraldo, “Deep...
- [31] M. S. Hossain, G. Muhammad, “Emotion recognition using deep learning approach from audio–visual emotional big data,” Information Fusion,...
- [32] A. Arronte Alvarez, F. Gómez, “Motivic pattern classification of music audio signals combining residual and lstm networks,” International Journal...
- [33] L. Nanni, G. Maguolo, M. Paci, “Data augmentation approaches for improving animal audio classification,” Ecological Informatics, vol....
- [34] Z. Mushtaq, S.-F. Su, “Environmental sound classification using a regularized deep convolutional neural network with data augmentation,” Applied...
- [35] S. Mertes, A. Baird, D. Schiller, B. W. Schuller, E. André, “An evolutionarybased generative approach for audio data augmentation,” in...
- [36] B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens, Q. V. Le, “Learning data augmentation strategies for object detection,” in Computer...
- [37] L. Nanni, Y. M. Costa, R. L. Aguiar, R. B. Mangolin, S. Brahnam, C. N. Silla, “Ensemble of convolutional neural networks to improve animal...
- [38] K. Choi, G. Fazekas, K. Cho, M. Sandler, “A tutorial on deep learning for music information retrieval,” arXiv preprint arXiv:1709.04396,...
- [39] M. S. Hossain, G. Muhammad, “Emotion recognition using deep learning approach from audio–visual emotional big data,” Information Fusion,...
- [40] H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, T. Sainath, “Deep learning for audio signal processing,” IEEE Journal of Selected...
- [41] D. de Benito-Gorron, A. Lozano-Diez, D. T. Toledano, J. GonzalezRodriguez, “Exploring convolutional, recurrent, and hybrid deep neural networks...
- [42] P. Wu, j. Liu, Y. Shi, Y. Sun, F. Shao, Z. Wu, Z. Yang, “Not only look, but also listen: Learning multimodal violence detection under...
- [43] W.-F. Pang, Q.-H. He, Y.-j. Hu, Y.-X. Li, “Violence detection in videos based on fusing visual and audio information,” in ICASSP 2021-2021 IEEE...
- [44] R.-R. O. S. Lab, “Ntu cctv-fights dataset.” https://rose1.ntu.edu.sg/dataset/ cctvFights/, 2019. Access 03/02/2023.
- [45] M. Perez, A. C. Kot, A. Rocha, “Detection of real-world fights in surveillance videos,” in ICASSP 2019-2019 IEEE International Conference on...
- [46] M. Schedi, M. Sjöberg, I. Mironică, B. Ionescu, V. L. Quang, Y.-G. Jiang, C.-H. Demarty, “Vsd2014: A dataset for violent scenes detection...
- [47] M. M. Soliman, M. H. Kamal, M. A. E.-M. Nashed, Y. M. Mostafa, B. S. Chawky, D. Khattab, “Violence recognition from videos using deep learning...
- [48] S. Tang, S. Yuan, Y. Zhu, “Data preprocessing techniques in convolutional neural network based on fault diagnosis towards rotating machinery,” IEEE...
- [49] C. Shorten, T. M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of big data, vol. 6, no. 1, pp. 1–48,...
- [50] K. O’Shea, R. Nash, “An introduction to convolutional neural networks,” arXiv preprint arXiv:1511.08458, 2015.
- [51] M. Tan, Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning,...
- [52] D. Sinha, M. El-Sharkawy, “Thin mobilenet: An enhanced mobilenet architecture,” in 2019 IEEE 10th annual ubiquitous computing, electronics...
- [53] J. P. Gujjar, H. P. Kumar, N. N. Chiplunkar, “Image classification and prediction using transfer learning in colab notebook,” Global...
- [54] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and...
- [55] M. Huh, P. Agrawal, A. A. Efros, “What makes imagenet good for transfer learning?,” arXiv preprint arXiv:1608.08614, 2016.
- [56] D. M. Powers, “Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation,” arXiv:2010.16061,...