Video decoding for DL models training with PyTorch

The repo was originally developed to illustrate a talk given at the London PyTorch Meetup:

Optimising Video Pipelines for Neural Network Training with PyTorch
by Nikolay Falaleev on 21/11/2024

The talk's slides are available here.

It contains examples of different approaches to decoding video frames directly into tensors, which can be used for training deep learning models with PyTorch.

Prerequisites

Nvidia GPU with Video Encode and Decode feature CUVID. Nvidia Driver version >= 535.
GNU make - it is quite likely that it is already installed on your system.
Docker and NVIDIA Container Toolkit.
Some video files for testing, put them in data/videos directory.

How to run

The project is provided with a Docker environment that includes PyTorch, as well as FFmpeg and OpenCV, which are compiled with NVIDIA hardware acceleration support.

Build Docker image:

make build

Run the container:

make run

The Docker container will have the project folder mounted to /workdir, including the contents of data and all the code.

All the following can be executed inside the running container.

Code navigation

Several base video readers classes are provided in src/video_io; they follow the same interface and inherit from AbstractVideoReader.

OpenCVVideoReader - Uses OpenCV's cv2.VideoCapture with the FFmpeg backend. It is the most straightforward way to read videos. Use os.environ["OPENCV_FFMPEG_CAPTURE_OPTIONS"] = "video_codec;h264_cuvid" to enable hardware acceleration. Adjust the video codec h264_cuvid parameter to match your video codec, e.g. h264_cuvid for h.264 codec and hevc_cuvid for HEVC codec; see all available codecs with Nvidia HW acceleration ffmpeg -decoders | grep -i nvidia.
TorchvisionReadVideo - uses PyTorch's torchvision.io module.
TorchcodecVideoReader - uses TorchCodec library. As TorchCodec is still in early stages of development and is installed from nightly builds, it may not work at some point or the API may change. This is likely to be the fastest video reader in the project.
VALIVideoReader - uses VALI library, which is a continuation of the VideoProcessingFramework project, which was discontinued by Nvidia. Unlike PyNvVideoCodec, which is the current substitution by Nvidia, VALI offers a more flexible solution that includes pixel format and color space conversion capabilities, as well as some low-level operations on surfaces. This allows it to be more powerful than PyNvVideoCodec, although it has a steeper learning curve, VALI allows for building more complex and optimized pipelines.

A simple benchmark script is provided in scripts/benchmark.py. It compares the performance of different readers. Adjust parameters of the benchmark as required. To run the script, run the following command in the project container:

python scripts/benchmark.py

In addition, there are some other examples of video-related components in the project:

Kornia video augmentations transforms.

Try one of the video readers:

from src.video_io import TorchvisionVideoReader

video_reader = TorchvisionVideoReader(
        "/workdir/data/videos/test.mp4", mode = "stream", output_format = "TCHW",
        device = "cuda:0")

frames_to_read = list(range(0, 100, 5))  # Read every 5th from
tensor = video_reader.read_frames(frames_to_read)
print(tensor.shape, tensor.device)  # Should be (20, 3, H, W), cuda:0

All video readers classes uses the same interface and return PyTorch tensors.

Arguments:

video_path (str or Path): Path to the input video file.

mode (seek or stream): Reading mode: seek - find each frame individually, stream - decode all frames in the range of requested indices and subsample. When using mode = 'stream', one needs to ensure that all frames in the range (min(frames_to_read), max(frames_to_read)) fit into VRAM. Defaults to stream.

output_format (THWC or TCHW): Data format: channels-last or channels-first. Defaults to THWC.

device (str, optional): Device to send the resulted tensor to. If possible, the same device will be used for HW acceleration of decoding. Defaults to cuda:0.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data		data
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video decoding for DL models training with PyTorch

Optimising Video Pipelines for Neural Network Training with PyTorch
by Nikolay Falaleev on 21/11/2024

Prerequisites

How to run

Code navigation

Try one of the video readers:

About

Releases

Packages

Languages

NikolasEnt/decode-video-pytorch

Folders and files

Latest commit

History

Repository files navigation

Video decoding for DL models training with PyTorch

Optimising Video Pipelines for Neural Network Training with PyTorch by Nikolay Falaleev on 21/11/2024

Prerequisites

How to run

Code navigation

Try one of the video readers:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Optimising Video Pipelines for Neural Network Training with PyTorch
by Nikolay Falaleev on 21/11/2024

Packages