🚀 The feature
Following up on #1215, the current implementation of sws_scale is a bottleneck when decoding 4K videos. On the plot below, you can see that decoding is capped at ~20ms regardless of the number of threads and seeking mode of the decoder (CPU decoding).
After some investigation, it turns out the YUV -> RGB conversion is dominating the decoding time and is very consistent. torchcodec uses the legacy sws_scale() FFmpeg function for YUV→RGB color conversion. This function is always single-threaded. FFmpeg 7+ introduced sws_scale_frame() with a threads option on SwsContext that parallelizes the conversion across CPU cores. PyAV and the ffmpeg CLI already use this new API, achieving roughly 2x faster conversion on 4K content.
Motivation, pitch
No response
🚀 The feature
Following up on #1215, the current implementation of
sws_scaleis a bottleneck when decoding 4K videos. On the plot below, you can see that decoding is capped at~20msregardless of the number of threads and seeking mode of the decoder (CPU decoding).After some investigation, it turns out the YUV -> RGB conversion is dominating the decoding time and is very consistent.
torchcodecuses the legacysws_scale()FFmpeg function for YUV→RGB color conversion. This function is always single-threaded. FFmpeg 7+ introducedsws_scale_frame()with athreadsoption onSwsContextthat parallelizes the conversion across CPU cores. PyAV and the ffmpeg CLI already use this new API, achieving roughly 2x faster conversion on 4K content.Motivation, pitch
No response