Analysis and extraction of features in video streaming refer to the process of identifying and extracting specific characteristics or patterns from a video stream that can be used for various purposes, such as object detection, recognition, and tracking, as well as video compression, indexing, and retrieval.
The extracted features can be used for different purposes, depending on the specific needs and requirements of the application.
End-to-end streaming latency refers to the delay between the time a video frame is captured and the time it is displayed on the users device. Analysis and extraction of features in video streaming can be used to measure end-to-end streaming latency by extracting specific characteristics or patterns from the video stream that indicate the start and end points of the video stream and the time stamps of each frame. In this work we propose a simple but effective way to measure the end-to-end streaming latency by using object detection and image-to-text conversion, both tasks based on the extraction of features of the underlying content.
Shot boundary detection is the process of identifying the boundaries between shots in a video stream. Shot boundary detection is an important task in video processing, as it is used for various applications, such as video editing, indexing, retrieval, and summarization. Analysis and extraction of features in video streaming can be used for shot boundary detection by extracting specific characteristics or patterns from the video stream that indicate changes in the visual and audio content. Once these features are extracted, various techniques can be used to detect shot boundaries, such as thresholding, clustering, and machine learning algorithms. In this work, we analyze state-of-the-art deep learning algorithms for shot boundary detection tasks and datasets and propose several new models that improve the efficiency of the last state-of-the-art models meanwhile keeping or even improving the resulting metrics.
Video temporal segmentation in scenes is the process of dividing a video stream into coherent temporal segments grouping all shots that are visually and semantically related to each other. In this work, we take advantage of the improvements done in the task of shot boundary detection to propose a foundational model in the task of segmenting the video into scenes, from a previous segmentation in shots. We propose a model based on visual similarity and we also contribute with a specific dataset for the task.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados