Abstract [eng] |
Using instance segmentation and video inpainting provides a significant leap in real-time football video broadcast enhancements by removing potential visual distractions, such as an occasional person or another object accidentally occupying the frame. Although its relevance and importance in the media industry, this area remains challenging and relatively understudied, thus offering potential areas for research. Specifically, the segmentation and inpainting of cameramen instances from video remain an underexplored research area. To address these challenges this paper proposed a framework designed to accurately detect and remove cameramen while seamlessly hallucinating the background in real-time football broadcasts. The approach aims to enhance the quality of the broadcast by maintaining its consistency and level of engagement to retain and attract users during the game. To implement the inpainting task, firstly, the cameramen instance segmentation method should be developed. This paper utilises a You Only Look Once (YOLOv8) and End-to-End Flow-Guided Video Inpainting (E2FGVI) models for accurate real-time operator instance segmentation and video inpainting. The resulting segmentation model produces masked frames, which will be used for further cameramen inpainting process. Moreover, this research presents an extensive "Cameramen Instances" dataset with more than 7500 samples, which serves as a solid foundation for future investigations in this area. Experimental results have shown that the YOLOv8 model performs better than other baseline algorithms in different scenarios. The precision of 95.5%, recall of 92.7%, mAP50-95 of 79.6, and a high FPS rate of 87 in low-volume environments, prove the solution efficacy for real-time applications. Furthermore, the YOLOv8 system efficiently generates binary masks of cameramen, which are crucial for the subsequent inpainting task. This project introduces the 'Cameramen-VOS' dataset, containing over 2500 videos, specifically aggregated for evaluating video inpainting in the context of football broadcasts. To achieve seamless inpainting, this study presents a novel and lightweight architecture by modifying the encoder and other components of the baseline E2FGVI system. The E2FGVI-lightweight model, trained in the specified domain of football, achieves an average PSNR of 25.48 and an SSIM of 0.84 on Cameramen-VOS, Yotube-VOS, and DAVIS datasets, with minimal reductions of 1.02% and 0.96% in PSNR and SSIM in comparison to the initial inpainting model. When integrated into the end-to-end cameramen inpainting system, the developed E2FGVI-lightweight model achieves a FLOPS rate of 224.94G and a frame rate of 8.32 FPS. This denotes a 68.76% reduction in FLOPs and a 100.48% improvement in frame rate over the E2FGVI -initial model, demonstrating its suitability for real-time applications. |