Human-Centric Video Summarization via Identity-Aware Tracking

Milad Mirjalili; Enrique Alegre Gutiérrez; Eduardo Fidalgo Fernández; Víctor González Castro; Waqar Tanveer

Ayuda

Human-Centric Video Summarization via Identity-Aware Tracking

Mirjalili, Milad ^[1] ; ALEGRE GUTIÉRREZ, ENRIQUE ^[1] ; FIDALGO FERNÁNDEZ, EDUARDO ^[1] ; GONZÁLEZ CASTRO, VICTOR ^[1] ; Tanveer, Waqar ^[1]
1. [1] Universidad de León
  
  Universidad de León
  
  León, España
Localización: Jornadas de Automática, ISSN-e 3045-4093, Nº. 46, 2025
Idioma: inglés
DOI: 10.17979/ja-cea.2025.46.12249
Enlaces
- Texto completo
Resumen
- español
  Presentamos un enfoque para el resumen de videos en base a la presencia e identidad de las personas a lo largo de los fotogramas. El enfoque propuesto combina puntos de referencia de la pose, representaciones faciales detalladas y características visuales del cuerpo. Estas características se agrupan de forma offline para realizar un seguimiento consistente de los individuos. Nuestro método no requiere datos etiquetados, lo que lo hace adecuado para procesar colecciones de video a gran escala sin necesidad de anotaciones. Al seleccionar fotogramas representativos donde los individuos clave aparecen con mayor frecuencia, el sistema genera resúmenes concisos y conscientes de la identidad que reflejan la dinámica de la presencia humana a lo largo del tiempo. Ejecutamos experimentos en diversas secuencias de video y logramos una puntuación F1 promedio del 99.4% para el seguimiento consistente de identidades. Esta estrategia centrada en la persona ofrece una solución escalable y generalizable para resumir videos en dominios donde comprender la actividad humana es esencial.
- English
  In this paper, we present an approach to video summarization that focuses on the presence and identity of people across video frames. The proposed framework combines pose landmarks, rich facial embeddings, and visual appearance features of the body to build a robust representation for each detected person. These features are clustered offline to enable consistent tracking of individuals throughout the video. Our method does not require labeled data, making it suitable for processing large-scale video collections without the need for annotations. By selecting representative frames in which key individuals appear most frequently, the system generates concise and identity-aware summaries that reflect the dynamics of human presence over time. We conducted experiments on diverse video sequences and achieved an average F1 score of 99.4% for consistent identity tracking. This person-centric strategy offers a scalable and generalizable solution for summarizing videos in domains where understanding human activity is essential.
Referencias bibliográficas
- Alaa, T., Mongy, A., Bakr, A., Diab, M., Gomaa, W., 2024. Video Summarization Techniques: A Comprehensive Review. https://doi.org/10.48550/ARXIV.2410.04449
- Apostolidis, E., Adamantidou, E., Metsai, A.I., Mezaris, V., Patras, I., 2021. AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial...
- Argaw, D.M., Yoon, S., Heilbron, F.C., Deilamsalehy, H., Bui, T., Wang, Z., Dernoncourt, F., Chung, J.S., 2024. Scaling Up Video Summarization...
- Basavarajaiah, M., Sharma, P., 2021. GVSUM: generic video summarization using deep visual features. Multimed. Tools Appl. 80, 14459–14476....
- Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T.L., Zhang, F., Grundmann, M., 2020. BlazePose: On-device Real-time Body Pose tracking....
- Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran, K., Grundmann, M., 2019. BlazeFace: Sub-millisecond Neural Face Detection on Mobile...
- Biswas, R., Chaves, D., Fernández-Robles, L., Fidalgo, E., Alegre, E., 2021. A Video Summarization Approach to Speed-up the Analysis of Child...
- Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y., 2021. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields....
- Chaves, D., Fidalgo, E., Alegre, E., Alaiz-Rodríguez, R., Jáñez-Martino, F., Azzopardi, G., 2020. Assessment and Estimation of Face Detection...
- Deng, J., Guo, J., Xue, N., Zafeiriou, S., 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition, in: 2019 IEEE/CVF Conference...
- Ester, M., Kriegel, H.-P., Sander, J., Xu, X., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise,...
- Fajtl, J., Sokeh, H.S., Argyriou, V., Monekosso, D., Remagnino, P., 2019. Summarizing Videos with Attention, in: Carneiro, G., You, S. (Eds.),...
- Gangwar, A., Fidalgo, E., Alegre, E., González-Castro, V., 2017. Pornography and child sexual abuse detection in image and video: a comparative...
- Gygli, M., Grabner, H., Riemenschneider, H., Van Gool, L., 2014. Creating Summaries from User Videos, in: Fleet, D., Pajdla, T., Schiele,...
- Hsu, T.-C., Liao, Y.-S., Huang, C.-R., 2023. Video Summarization With Spatiotemporal Vision Transformer. IEEE Trans. Image Process. 32, 3013–3026....
- Jocher, G., Qiu, J., Chaurasia, A., 2023. Ultralytics YOLO.
- Li, H., Klabjan, D., Utke, J., 2024. Unsupervised Video Summarization via Iterative Training and Simplified GAN, in: Proceedings of the Asian...
- Liu, T., Meng, Q., Huang, J.-J., Vlontzos, A., Rueckert, D., Kainz, B., 2022. Video Summarization Through Reinforcement Learning With a 3D...
- Meena, P., Kumar, H., Kumar Yadav, S., 2023. A review on video summarization techniques. Eng. Appl. Artif. Intell. 118, 105667. https://doi.org/10.1016/j.engappai.2022.105667
- Paul, M., Musfequs Salehin, Md., 2019. Spatial and Motion Saliency Prediction Method Using Eye Tracker Data for Video Summarization. IEEE...
- Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., others, 2021. Learning...
- Ramos, W., Silva, M., Araujo, E., Moura, V., Oliveira, K., Marcolino, L.S., Nascimento, E.R., 2023. Text-Driven Video Acceleration: A Weakly-Supervised...
- Tiwari, V., Bhatnagar, C., 2021. A survey of recent work on video summarization: approaches and techniques. Multimed. Tools Appl. 80, 27187–27221....
- U., S.M., Kovoor, B.C., 2021. An aggregated deep convolutional recurrent model for event based surveillance video summarisation: A supervised...
- Varghese, R., M., S., 2024. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness, in: 2024 International Conference...
- Wojke, N., Bewley, A., Paulus, D., 2017. Simple online and realtime tracking with a deep association metric, in: 2017 IEEE International Conference...
- Wu, G., Lin, J., Silva, C.T., 2022. Intentvizor: Towards generic query guided interactive video summarization, in: Proceedings of the IEEE/CVF...
- Yale Song, Vallmitjana, J., Stent, A., Jaimes, A., 2015. TVSum: Summarizing web videos using titles, in: 2015 IEEE Conference on Computer...
- Yang, J.-A., Lee, C.-H., Yang, S.-W., Somayazulu, V.S., Chen, Y.-K., Chien, S.-Y., 2016. Wearable social camera: Egocentric video summarization...
- Zhang, Ke, Chao, W.-L., Sha, F., Grauman, K., 2016. Video Summarization with Long Short-Term Memory, in: Leibe, B., Matas, J., Sebe, N., Welling,...
- Zhang, Kaipeng, Zhang, Z., Li, Z., Qiao, Y., 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE...
- Zhao, Y., Lv, G., Ma, T., Ji, H., Zheng, H., 2015. A novel method of surveillance video Summarization based On clustering and background subtraction,...