Video FFMpeg使用指南 (v2.0)

更新时间:
复制为 MD 格式

1. 概述

PPU SDK兼容支持Nvidia Video Codec SDK,所以能直接使用FFmpegcuvid、nvenc、libnpp这些插件来支持硬件加速,不需要做额外的代码修改。已经验证的FFmpeg版本包括:

FFmpeg 8.0

对应nv-codec-headers 13.0.19.0

Video Codec SDK需支持到13.0及以上版本

最新releasemajor版本,新增pad_cudafilters

FFmpeg 7.1.1

对应nv-codec-headers 12.2.72.0

Video Codec SDK需支持到12.2及以上版本

FFmpeg官方最新版本,稳定release的软件VVC decoder

FFmpeg 7.0.1

对应nv-codec-headers 12.2.72.0

Video Codec SDK需支持到12.2及以上版本

FFmpeg官方最新版本,大幅优化了transcode的并行化性能。

FFmpeg 6.1.2

依赖nv-codec-headers 12.1.14.0

Video Codec SDK需支持到12.1及以上版本

PyAV等框架依赖这个版本。这也是目前使用最广泛的版本。

2. 编译指南

请在SAIL源码发布仓库中clone代码,不需要额外打上任何patch。

FFmpeg:https://github.com/FFmpeg/FFmpeg

nv-codec-headers:https://github.com/FFmpeg/nv-codec-headers

重要

请注意FFmpeg版本跟nv-codec-headers版本之间的对应匹配关系。

FFmpeg分支

基于开源社区的release tag

nv-codec-headers分支

基于开源社区的release tag

v8.0_release

n8.0

v13.0_release

n13.0.19.0

v7.1_release

n7.1.1

v12.2_release

n12.2.72.0

v7.0_release

n7.0.1

v12.2_release

n12.2.72.0

v6.1_release

n6.1.2

v12.1_release

n12.1.14.0

编译前请确认已经安装SAIL SDK并执行了envsetup.sh,配置好SDK的环境。

2.1. 准备ffmpeg code

  1. FFmpeg 8.0:

    setup_ffmpeg.sh

    #内部客户地址
    git clone -b v8.0_release git@gitlab.alibaba-inc.com:ppu_open_source/FFmpeg.git
    git clone -b v13.0_release git@gitlab.alibaba-inc.com:ppu_open_source/nv-codec-headers.git
    #外部客户地址
    git clone -b v8.0_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/FFmpeg.git 
    git clone -b v13.0_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/nv-codec-headers.git 
    #FFmpeg开源社区地址
    git clone -b n8.0 https://github.com/FFmpeg/FFmpeg.git
    pushd FFmpeg
    git apply ffmpeg8.patch
    popd
    git clone -b n13.0.19.0 https://github.com/FFmpeg/nv-codec-headers.git
    pushd nv-codec-headers
    git apply nvcodec13_0.patch
    popd
  2. FFmpeg 7.1.1:

    setup_ffmpeg.sh

    #内部客户地址
    git clone -b v7.1_release git@gitlab.alibaba-inc.com:ppu_open_source/FFmpeg.git
    git clone -b v12.2_release git@gitlab.alibaba-inc.com:ppu_open_source/nv-codec-headers.git
    #外部客户地址
    git clone -b v7.1_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/FFmpeg.git
    git clone -b v12.2_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/nv-codec-headers.git
    #FFmpeg开源社区地址
    git clone -b n7.1.1 https://github.com/FFmpeg/FFmpeg.git 
    pushd FFmpeg
    git apply ffmpeg7.patch
    popd
    git clone -b n12.2.72.0 https://github.com/FFmpeg/nv-codec-headers.git
    pushd nv-codec-headers
    git apply nvcodec12_2.patch
    popd
  1. FFmpeg 7.0.1:

    setup_ffmpeg.sh

    #内部客户地址
    git clone -b v7.0_release git@gitlab.alibaba-inc.com:ppu_open_source/FFmpeg.git 
    git clone -b v12.2_release git@gitlab.alibaba-inc.com:ppu_open_source/nv-codec-headers.git 
    #外部客户地址
    git clone -b v7.0_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/FFmpeg.git 
    git clone -b v12.2_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/nv-codec-headers.git 
    #FFmpeg开源社区地址
    git clone -b n7.0.1 https://github.com/FFmpeg/FFmpeg.git
    pushd FFmpeg
    git apply ffmpeg7.patch
    popd
    git clone -b n12.2.72.0 https://github.com/FFmpeg/nv-codec-headers.git
    pushd nv-codec-headers
    git apply nvcodec12_2.patch
    popd
  2. FFmpeg 6.1.2:

    setup_ffmpeg.sh

    #内部客户地址
    git clone -b v6.1_release git@gitlab.alibaba-inc.com:ppu_open_source/FFmpeg.git
    git clone -b v12.1_release git@gitlab.alibaba-inc.com:ppu_open_source/nv-codec-headers.git
    #外部客户地址
    git clone -b v6.1_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/FFmpeg.git
    git clone -b v12.1_release git@codeup.aliyun.com:6853d0dd7f2c70c9af4bf48c/nv-codec-headers.git
    #FFmpeg开源社区地址
    git clone -b n6.1.2 https://github.com/FFmpeg/FFmpeg.git
    pushd FFmpeg
    git apply ffmpeg6.patch
    popd
    git clone -b n12.1.14.0 https://github.com/FFmpeg/nv-codec-headers.git
    pushd nv-codec-headers
    git apply nvcodec12_1.patch
    popd

(optional)通过FFmpeg开源社区地址下载的源码,请按照FFmpeg版本打上相应的patch

Patch List:

FFmpeg patch

nv-codec-headers patch

FFmpeg 8.0

ffmpeg8.patch

nvcodec13_0.patch

FFmpeg 7.1.1

ffmpeg7.patch

nvcodec12_2.patch

FFmpeg 7.0.1

ffmpeg7.patch

nvcodec12_2.patch

FFmpeg 6.1.2

ffmpeg6.patch

nvcodec12_1.patch

2.2. build ffmpeg

build_ffmpeg.sh:

apt install nasm pkgconf 
apt install libdav1d-dev # ubuntu上可用apt源安装,
                         # 或者源码安装,repo: https://code.videolan.org/videolan/dav1d.git
mkdir output
cd ffmpeg
export PKG_CONFIG_PATH=$(pwd)/../nv-codec-headers:$PKG_CONFIG_PATH
./configure --enable-shared \
    --enable-nonfree --enable-gpl \
    --enable-cuvid --enable-nvenc --enable-libnpp --enable-libdav1d --enable-cuda-nvcc \
    --extra-cflags="-I$(pwd)/../nv-codec-headers/include -I$CUDA_HOME/include" \
    --extra-ldflags="-L$CUDA_HOME/lib64" \
    --prefix=$(pwd)/../output
    
make -j16 && make install
说明

--prefix指向编译成功后的安装目录,如果不设置,默认安装到/usr/local目录。

编译安装成功后,在安装目录下会生成bin、include、lib、share。lib目录下有pkgconfig目录,输入pkg-config --modversion libavcodec可以看到版本号。

3. Commit历史

为了适配并提供一些定制化的功能,基于开源社区的代码,FFmpeg & nv-coder-headers有一些commit,按照时间排序如下:

  1. libnpp filters支持CUDA 13.0+: 目前ffmpeg开源社区的libnpp filters并不支持CUDA 13.0+, 我们做了CUDA 13.0+上的适配。

  2. 定制了cuvid支持rgb24 format输出:利用硬件解码inline color conversion的能力,让ffmpeg cuvid可以支持输出rgb24 format,命令使用示例如下:

    ffmpeg -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -csc rgb24 -i test.mp4 -f rawvideo -
  3. FFmpeg app支持raw nvdec mode解码:将用户的nvdec请求转变成nv cuvid dec请求,这样如果已有的ffmpeg指令只声明了-hwaccel cuda没有声明-cuvid也可以正常解码,不需要修改上层指令。

    ffmpeg -hwaccel cuda -i test.mp4 -frames 10 -y out-dec-h264.yuv
  4. 修复一个scale_npp filter使用transpose功能可能出现的corruption问题。

  5. SAIL SDK编译适配:因为部分CUDA runtime API不支持,该commit删除了nv-coder-headers里不支持的CUDA runtime接口定义以确保编译正常。

4. 运行示例

cd ../output
export LD_LIBRARY_PATH=lib:$LD_LIBRARY_PATH

# ffmpeg decode test with cuvid
bin/ffmpeg -vcodec h264_cuvid -i 640x360_y8c8.h264 -frames 10 -y out-dec-h264.yuv
bin/ffmpeg -hwaccel cuda -i 640x360_y8c8.mp4 -frames 10 -y out-dec-h264.yuv  # 必须打上Patch2才可以

# ffmpeg transcode test with cuvid and nvenc
bin/ffmpeg -hwaccel_output_format cuda -vcodec h264_cuvid -i 640x360_y8c8.h264 -vcodec hevc_nvenc -r 30.0 -b:v 15000000 -preset p4 output.hevc
# 如果打上Patch2也可以这样写:
bin/ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i 640x360_y8c8.h264 -vcodec hevc_nvenc -r 30.0 -b:v 15000000 -preset p4 output.hevc

# ffmpeg transcode test with cuvid and nvenc, resize with npp support
bin/ffmpeg -hwaccel_output_format cuda -vcodec h264_cuvid -i input.h264 -vcodec hevc_nvenc -frames 300 -preset p7 -acodec copy -y output.h265

5. FAQ

Q:目前ffmpeg基于PPU可以支持哪些硬件加速能力?

可以支持h264_cuvid,hevc_cuvid,av1_cuvidvp9_cuvid硬件解码(如果有avs2的需求,可以申请额外补丁包);可以支持h264_nvenc、hevc_nvenc、av1_nvenc硬件编码;可以支持transpose、color-conversion、scale npp加速。

Q:目前ffmpeg使用上有哪些限制?

A. 目前video decode不支持raw nvdec mode。 如果使用FFmpeg app,我们做了定制化的适配,可以直接使用raw nvdec mode;如果直接调用libavcodec的接口做开发,可以参考下面的使能示例(sample.cpp):

sample.cpp

#include <iostream>
extern "C"
{
	#include <libavformat/avformat.h>
	#include <libavcodec/avcodec.h>
	#include <libswscale/swscale.h>
}

static enum AVPixelFormat get_hw_format(AVCodecContext *ctx, const enum AVPixelFormat *pix_fmts) {
    for (const enum AVPixelFormat *p = pix_fmts; *p != -1; p++) {
        if (*p == AV_PIX_FMT_CUDA)
		{
			std::cout<<"receive cuda fmt"<<std::endl;
            return *p;
		}
    }
    return AV_PIX_FMT_NONE;
}

static const char* get_cuvid_decoder_name(AVCodecID codec_id) {
    switch(codec_id) {
        case AV_CODEC_ID_H264:
            return "h264_cuvid";
        case AV_CODEC_ID_HEVC:
            return "hevc_cuvid";
        case AV_CODEC_ID_VP9:
            return "vp9_cuvid";
        case AV_CODEC_ID_AV1:
            return "av1_cuvid";
        default:
            return "";
    }
}

int main(int argc, char** argv) {
    if (argc < 2) {
        std::cout << "Usage: " << argv[0] << " <input.hevc>" << std::endl;
        return -1;
    }


    AVFormatContext* fmt_ctx = nullptr;
    if (avformat_open_input(&fmt_ctx, argv[1], nullptr, nullptr) != 0) {
        std::cout << "Could not open input file" << std::endl;
        return -1;
    }

    if (avformat_find_stream_info(fmt_ctx, nullptr) < 0) {
        std::cout << "Could not find stream information" << std::endl;
        avformat_close_input(&fmt_ctx);
        return -1;
    }


    int video_stream_idx = -1;
    AVCodecParameters* codec_params = nullptr;
    for (unsigned int i = 0; i < fmt_ctx->nb_streams; i++) {
        if (fmt_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
            video_stream_idx = i;
            codec_params = fmt_ctx->streams[i]->codecpar;
            break;
        }
    }

    if (video_stream_idx == -1) {
        std::cout << "Could not find video stream" << std::endl;
        avformat_close_input(&fmt_ctx);
        return -1;
    }

    AVHWDeviceType type = av_hwdevice_find_type_by_name("cuda");
    if (type == AV_HWDEVICE_TYPE_NONE) {
        std::cout << "Could not support cuda hw" << std::endl;
        avformat_close_input(&fmt_ctx);
        return -1;
    }

    AVBufferRef *hw_device_ctx = NULL;
    int err = av_hwdevice_ctx_create(&hw_device_ctx, type, NULL, NULL, 0);
    if (err < 0) {
        std::cout << "Could not create hw context"<<std::endl;
		avformat_close_input(&fmt_ctx);
		return -1;
    }
	


    const char* cuvid_decoder = get_cuvid_decoder_name(codec_params->codec_id);
    const AVCodec* codec = NULL;
    if (!cuvid_decoder) {
        codec = avcodec_find_decoder(codec_params->codec_id);
    } else {
        codec = avcodec_find_decoder_by_name(cuvid_decoder);
    }
    if (!codec) {
        std::cout << "Unsupported codec!" << std::endl;
        avformat_close_input(&fmt_ctx);
        return -1;
    }


    AVCodecContext* codec_ctx = avcodec_alloc_context3(codec);
    if (avcodec_parameters_to_context(codec_ctx, codec_params) < 0) {
        std::cout << "Could not initialize codec context" << std::endl;
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&fmt_ctx);
        return -1;
    }

    //set buf to hwbuffer
    codec_ctx->hw_device_ctx = av_buffer_ref(hw_device_ctx);
	
	if (!codec_ctx->hw_device_ctx) {
        std::cout << "Failed to set hardware device context." << std::endl;
        avcodec_free_context(&codec_ctx);
        av_buffer_unref(&hw_device_ctx);
		avformat_close_input(&fmt_ctx);
        return -1;
    }
	codec_ctx->get_format = get_hw_format;


    if (avcodec_open2(codec_ctx, codec, nullptr) < 0) {
        std::cout << "Could not open codec" << std::endl;
        avcodec_free_context(&codec_ctx);
        avformat_close_input(&fmt_ctx);
		av_buffer_unref(&hw_device_ctx);
        return -1;
    }


    AVFrame* frame = av_frame_alloc();
    AVPacket* pkt = av_packet_alloc();

	
	int frame_count = 0;
    while (av_read_frame(fmt_ctx, pkt) >= 0) {
        if (pkt->stream_index == video_stream_idx) {

            int ret = avcodec_send_packet(codec_ctx, pkt);
            if (ret < 0) {
                std::cout << "Error sending packet to decoder" << std::endl;
                break;
            }


            while (ret >= 0) {
                ret = avcodec_receive_frame(codec_ctx, frame);
                if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
                    break;
                } else if (ret < 0) {
                    std::cout << "Error during decoding" << std::endl;
                    break;
                }
				frame_count++;
                std::cout << "Decoded frame " << frame_count << std::endl;
            }
        }
        av_packet_unref(pkt);
    }

    av_frame_free(&frame);
    avcodec_free_context(&codec_ctx);
    avformat_close_input(&fmt_ctx);

    return 0;
}

6. 附录:ffmpeg生态

6.1. ffmpeg-python

官网:https://github.com/kkroening/ffmpeg-python

ffmpeg-python实质是把你输入的参数拼合之后,调用ffmpeg程序来执行,所以需要你的运行环境有安装ffmpeg,或者设置了ffmpeg所在的binlib目录。

参考示例:

import ffmpeg
import sys

probe = ffmpeg.probe('input.mp4')
video_stream = next((stream for stream in probe['streams'] if stream['codec_type'] == 'video'), None)

if video_stream is None or video_stream['codec_name'] is None:
    sys.exit()

out, _ = (
    ffmpeg.input('input.mp4', hwaccel='cuda')
    .output('output.rgb', format='rawvideo', pix_fmt='rgb24')
    .run()
)

ffmpeg-python本质上是ffmpeg指令封装,除了文件之外,也可以基于'pipe:'输入和输出数据(即stdin, stdout),数据在CPU而编解码在GPU,所以不可避免会有数据的拷贝,因此不推荐。

6.2. PyAV

Torchvision支持PyAV,video_readercuda这三个backend,默认的backendPyAV。这其中video_reader是纯CPU编解码,cuda是使用GPU decoder解码。PyAV则会调用FFmpeg来完成编解码。所以PyAV实际是依赖FFmpeg本身支持的硬件加速能力,需要编译安装支持硬件加速的ffmpeg。

说明

到目前为止,PyAV还不支持FFmpeg 7.0,所以需要编译附录A提到的FFmpeg6.1版本。

PyAV官网:https://github.com/PyAV-Org/PyAV

调用路径1:pytorch->torchvision->PyAV->ffmpeg:

硬件解码支持(需要修改torchvision的代码,torchvision/io/video_reader.py):

--- a/torchvision/io/video_reader.py
+++ b/torchvision/io/video_reader.py
@@ -280,6 +280,13 @@ class VideoReader:
         if self.backend == "pyav":
             stream_type = stream.split(":")[0]
             stream_id = 0 if len(stream.split(":")) == 1 else int(stream.split(":")[1])
+            video_stream = self.container.streams.video[stream_id]
+            # Setting up the codec with cuvid
+            if video_stream.codec.name in ('h264', 'hevc', 'av1', 'vp9'):
+                codec_name = f'{video_stream.codec.name}_cuvid'
+            else:
+                codec_name = video_stream.codec.name  # Fallback to software decoding
+            video_stream.codec_context = av.codec.CodecContext.create(codec_name, 'r')
             self.pyav_stream = {stream_type: stream_id}
             self._c = self.container.decode(**self.pyav_stream)

PyAV输出的数据默认是TCHW格式。

硬件编码支持(不需要修改torchvision代码,但用户调用时指定使用nvenc名称):

from torchvision.io import write_video
write_video(save_path, frames, fps=fps, video_codec="h264_nvenc")  # 这里需要明确指定,目前我们支持h264_nvenc,av1_nvenc, hevc_nvenc

调用路径2:pytorch-> torchaudio-> pyAV -> ffmpeg:

Torchaudio也支持pyAV作为backend实现硬件解码,不需要修改torchaudio代码,但需要明确指定cuvid的名称:

import torch
import torchaudio

from torchaudio.io import StreamReader
from torchaudio.utils import ffmpeg_utils

s = StreamReader(src)
s.add_video_stream(int(s.get_src_stream_info(0).frame_rate), decoder="h264_cuvid") # 这里需要明确指定cuvid,目前我们支持h264_cuvid,av1_cuvid, hevc_cuvid, vp9_cuvid
s.fill_buffer()
(video,) = s.pop_chunks()

备注:可以通过 torchvision.get_video_backend()获得当前的backend,可以通过torchvision.set_video_backend("pyav")设置torchvision的默认backend。但默认安装的torchvision不支持“cuda”这个backend,需要手动编译安装torchvision。

6.3. TorchCodec

官方文档:https://meta-pytorch.org/torchcodec/stable/index.html

项目repo:https://github.com/pytorch/torchcodec

使能通路:pytorch-> torchcodec -> ffmpeg

SAIL源中增加了torchcodec 0.2.10.4pip安装, 推荐从SAIL源中直接安装使用TorchCodec。

TorchCodec依赖与Torch版本,TorchCodec版本和PyTorch版本的依赖关系:

TorchCodec

Torch

Python

main / nightly

main / nightly

>=3.9, <=3.13

0.7

2.8

>=3.9, <=3.13

0.6

2.8

>=3.9, <=3.13

0.5

2.7

>=3.9, <=3.13

0.4

2.7

>=3.9, <=3.13

0.3

2.7

>=3.9, <=3.13

0.2.1

2.6

>=3.9, <=3.13

0.2.0

2.6

>=3.9, <=3.13

0.1.1

2.5

>=3.9, <=3.12

(optional) 下面提供源码编译安装TorchCodec的方法:

  1. 安装依赖库:FFmpeg(FFmpeg6.1.2FFmpeg7.0.1都是可以的)。

  2. 源码编译并打包安装:

# release_tag: v0.7.0 v0.4.0 v0.2.1
git clone -b <release_tag> https://github.com/pytorch/torchcodec.git
cd torchcodec/
git apply torchcodec_<release_tag>.patch
ENABLE_CUDA=1 I_CONFIRM_THIS_IS_NOT_A_LICENSE_VIOLATION=1 pip install . --no-build-isolation

patch文件如下:

torchcodec_v0.7.0.patch

diff --git a/src/torchcodec/_core/SingleStreamDecoder.cpp b/src/torchcodec/_core/SingleStreamDecoder.cpp
index 865179a..b5788e7 100644
--- a/src/torchcodec/_core/SingleStreamDecoder.cpp
+++ b/src/torchcodec/_core/SingleStreamDecoder.cpp
@@ -27,6 +27,17 @@ int64_t secondsToClosestPts(double seconds, const AVRational& timeBase) {
       std::round(seconds * timeBase.den / timeBase.num));
 }
 
+inline char* find_codec(const char* input) {
+    const char* codecs[] = {"h264", "hevc", "av1", "vp9"};
+    size_t codec_len = sizeof(codecs) / sizeof(codecs[0]);
+    for (size_t i = 0; i < codec_len; ++i) {
+        if (strstr(input, codecs[i])) {
+            return (char*)codecs[i];
+        }
+    }
+    return NULL;
+}
+
 // Some videos aren't properly encoded and do not specify pts values for
 // packets, and thus for frames. Unset values correspond to INT64_MIN. When that
 // happens, we fallback to the dts value which hopefully exists and is correct.
@@ -425,9 +436,22 @@ void SingleStreamDecoder::addStream(
   // addStream() which is supposed to be generic
   if (mediaType == AVMEDIA_TYPE_VIDEO) {
     if (deviceInterface_) {
-      avCodec = makeAVCodecOnlyUseForCallingAVFindBestStream(
-          deviceInterface_->findCodec(streamInfo.stream->codecpar->codec_id)
-              .value_or(avCodec));
+      if (device.type() != torch::kCUDA) {
+        avCodec = makeAVCodecOnlyUseForCallingAVFindBestStream(
+            deviceInterface_->findCodec(streamInfo.stream->codecpar->codec_id)
+                .value_or(avCodec));
+      }
+      else {
+        const char* cuvid_suffix = "_cuvid";
+        char* codec_name = find_codec(avCodec->name);
+        size_t cuvid_length = std::strlen(codec_name) + std::strlen(cuvid_suffix) + 1;
+        char* cuvid_name = new char[cuvid_length];
+        std::strcpy(cuvid_name, codec_name);
+        std::strcat(cuvid_name, cuvid_suffix);
+        avCodec = avcodec_find_decoder_by_name(cuvid_name);
+        delete[] cuvid_name;
+        TORCH_CHECK(avCodec != nullptr);
+      }
     }
   }
 

torchcodec_v0.4.0.patch

diff --git a/src/torchcodec/_core/SingleStreamDecoder.cpp b/src/torchcodec/_core/SingleStreamDecoder.cpp
index b73f703..0222087 100644
--- a/src/torchcodec/_core/SingleStreamDecoder.cpp
+++ b/src/torchcodec/_core/SingleStreamDecoder.cpp
@@ -26,6 +26,17 @@ int64_t secondsToClosestPts(double seconds, const AVRational& timeBase) {
       std::round(seconds * timeBase.den / timeBase.num));
 }
 
+inline char* find_codec(const char* input) {
+    const char* codecs[] = {"h264", "hevc", "av1", "vp9"};
+    size_t codec_len = sizeof(codecs) / sizeof(codecs[0]);
+    for (size_t i = 0; i < codec_len; ++i) {
+        if (strstr(input, codecs[i])) {
+            return (char*)codecs[i];
+        }
+    }
+    return NULL;
+}
+
 // Some videos aren't properly encoded and do not specify pts values for
 // packets, and thus for frames. Unset values correspond to INT64_MIN. When that
 // happens, we fallback to the dts value which hopefully exists and is correct.
@@ -388,9 +399,22 @@ void SingleStreamDecoder::addStream(
   // addStream() which is supposed to be generic
   if (mediaType == AVMEDIA_TYPE_VIDEO) {
     if (deviceInterface_) {
-      avCodec = makeAVCodecOnlyUseForCallingAVFindBestStream(
-          deviceInterface_->findCodec(streamInfo.stream->codecpar->codec_id)
-              .value_or(avCodec));
+      if (device.type() != torch::kCUDA) {
+          avCodec = makeAVCodecOnlyUseForCallingAVFindBestStream(
+              deviceInterface_->findCodec(streamInfo.stream->codecpar->codec_id)
+                  .value_or(avCodec));
+      }
+      else {
+        const char* cuvid_suffix = "_cuvid";
+        char* codec_name = find_codec(avCodec->name);
+        size_t cuvid_length = std::strlen(codec_name) + std::strlen(cuvid_suffix) + 1;
+        char* cuvid_name = new char[cuvid_length];
+        std::strcpy(cuvid_name, codec_name);
+        std::strcat(cuvid_name, cuvid_suffix);
+        avCodec = avcodec_find_decoder_by_name(cuvid_name);
+        delete[] cuvid_name;
+        TORCH_CHECK(avCodec != nullptr);
+      }
     }
   }
 
@@ -417,6 +441,12 @@ void SingleStreamDecoder::addStream(
     throw std::invalid_argument(getFFMPEGErrorStringFromErrorCode(retVal));
   }
 
+  if (device.type() == torch::kCUDA) {
+    codecContext->hw_frames_ctx = av_hwframe_ctx_alloc(codecContext->hw_device_ctx);
+    AVHWFramesContext* hwframe_ctx = (AVHWFramesContext*)codecContext->hw_frames_ctx->data;
+    // in avcodec_open2, cuvid will not set sw_format until sequence callback in avcodec_send_packet()
+    hwframe_ctx->sw_format = codecContext->sw_pix_fmt;
+  }
   codecContext->time_base = streamInfo.stream->time_base;
   containerMetadata_.allStreamMetadata[activeStreamIndex_].codecName =
       std::string(avcodec_get_name(codecContext->codec_id));

torchcodec_v0.2.1.patch

diff --git a/src/torchcodec/decoders/_core/VideoDecoder.cpp b/src/torchcodec/decoders/_core/VideoDecoder.cpp
index 97214ce..c88c581 100644
--- a/src/torchcodec/decoders/_core/VideoDecoder.cpp
+++ b/src/torchcodec/decoders/_core/VideoDecoder.cpp
@@ -40,6 +40,17 @@ int64_t secondsToClosestPts(double seconds, const AVRational& timeBase) {
   return static_cast<int64_t>(std::round(seconds * timeBase.den));
 }
 
+inline char* find_codec(const char* input) {
+    const char* codecs[] = {"h264", "hevc", "av1", "vp9"};
+    size_t codec_len = sizeof(codecs) / sizeof(codecs[0]);
+    for (size_t i = 0; i < codec_len; ++i) {
+        if (strstr(input, codecs[i])) {
+            return (char*)codecs[i];
+        }
+    }
+    return NULL;
+}
+
 std::vector<std::string> splitStringWithDelimiters(
     const std::string& str,
     const std::string& delims) {
@@ -449,9 +460,15 @@ void VideoDecoder::addStream(
   // TODO_CODE_QUALITY it's pretty meh to have a video-specific logic within
   // addStream() which is supposed to be generic
   if (mediaType == AVMEDIA_TYPE_VIDEO && device.type() == torch::kCUDA) {
-    avCodec = makeAVCodecOnlyUseForCallingAVFindBestStream(
-        findCudaCodec(device, streamInfo.stream->codecpar->codec_id)
-            .value_or(avCodec));
+    const char* cuvid_suffix = "_cuvid";
+    char* codec_name = find_codec(avCodec->name);
+    size_t cuvid_length = std::strlen(codec_name) + std::strlen(cuvid_suffix) + 1;
+    char* cuvid_name = new char[cuvid_length];
+    std::strcpy(cuvid_name, codec_name);
+    std::strcat(cuvid_name, cuvid_suffix);
+    avCodec = avcodec_find_decoder_by_name(cuvid_name);
+    delete[] cuvid_name;
+    TORCH_CHECK(avCodec != nullptr);
   }
 
   AVCodecContext* codecContext = avcodec_alloc_context3(avCodec);
@@ -474,6 +491,12 @@ void VideoDecoder::addStream(
     throw std::invalid_argument(getFFMPEGErrorStringFromErrorCode(retVal));
   }
 
+  if (device.type() == torch::kCUDA) {
+    codecContext->hw_frames_ctx = av_hwframe_ctx_alloc(codecContext->hw_device_ctx);
+    AVHWFramesContext* hwframe_ctx = (AVHWFramesContext*)codecContext->hw_frames_ctx->data;
+    // in avcodec_open2, cuvid will not set sw_format until sequence callback in avcodec_send_packet()
+    hwframe_ctx->sw_format = codecContext->sw_pix_fmt;
+  }
   codecContext->time_base = streamInfo.stream->time_base;
   containerMetadata_.allStreamMetadata[activeStreamIndex_].codecName =
       std::string(avcodec_get_name(codecContext->codec_id));

使用示例:

python sample_torchcodec.py
import torch

print(f"{torch.__version__=}")
print(f"{torch.cuda.is_available()=}")
print(f"{torch.cuda.get_device_properties(0)=}")

import torchcodec
from torchcodec.decoders import VideoDecoder

decoder = VideoDecoder("sample.mp4", device="cuda")
frame = decoder[0]

6.4. imageio-ffmpeg

imageio-ffmpeg 是 imageio 的一个插件,它提供对 FFmpeg 的封装。跟ffmpeg-python一样,其工作实质是把输入的参数拼合之后,调用ffmpeg程序来执行,所以需要你的运行环境有安装ffmpeg,或者设置了ffmpeg所在的binlib目录。不建议使用,官方明确表示:You should probably use PyAV instead; it is faster and offers more features。

imageio-ffmpeg: https://github.com/imageio/imageio-ffmpeg

使用示例:

import imageio

video_reader = imageio.get_reader('input.mp4', 'ffmpeg', ffmpeg_params=['-c:v', 'h264_cuvid'])
frame_list = []
for i, frame in enumerate(video_reader):
    frame_list.append(frame)
imageio.mimsave('out.mp4', frame_list)

6.5. Decord

默认安装的DecordCPU版本,不支持硬件加速,需要手动编译安装。

Decord依赖FFmpeg,但只是使用demux能力,不需要额外编译支持硬件加速的FFmpeg版本。

因为ffmpeg老版本有一个问题会导致decord死锁,这个问题在ffmpeg7才被解决,所以请编译安装FFmpeg7.0.1或者FFmpeg7.1.1。

下载Decord

git clone -b v0.6.0 --depth 1 --recursive https://github.com/dmlc/decord
cd decord
git apply decord_ffmpeg7.patch

patch内容主要是把decordFFmpeg的依赖从4.2改成FFmpeg7.0.1或者FFmpeg7.1.1

decord_ffmpeg7.patch

diff --git a/src/audio/audio_reader.cc b/src/audio/audio_reader.cc
index be706f1..78c62fb 100644
--- a/src/audio/audio_reader.cc
+++ b/src/audio/audio_reader.cc
@@ -128,7 +128,11 @@ namespace decord {
                 pCodecParameters = tempCodecParameters;
                 originalSampleRate = tempCodecParameters->sample_rate;
                 if (targetSampleRate == -1) targetSampleRate = originalSampleRate;
+#if LIBAVUTIL_VERSION_INT >= AV_VERSION_INT(57, 28, 100)
+                numChannels = tempCodecParameters->ch_layout.nb_channels;
+#else
                 numChannels = tempCodecParameters->channels;
+#endif
                 break;
             }
         }
@@ -229,7 +233,11 @@ namespace decord {
         // allocate resample buffer
         float** outBuffer;
         int outLinesize = 0;
+#if LIBAVUTIL_VERSION_INT >= AV_VERSION_INT(57, 28, 100)
+        int outNumChannels = mono ? AV_CH_LAYOUT_MONO : pFrame->ch_layout.nb_channels;
+#else
         int outNumChannels = av_get_channel_layout_nb_channels(mono ? AV_CH_LAYOUT_MONO : pFrame->channel_layout);
+#endif
         numChannels = outNumChannels;
         int outNumSamples = av_rescale_rnd(pFrame->nb_samples,
                                            this->targetSampleRate, pFrame->sample_rate, AV_ROUND_UP);
@@ -281,11 +289,17 @@ namespace decord {
         if (!this->swr) {
             LOG(FATAL) << "ERROR Failed to allocate resample context";
         }
+#if LIBAVUTIL_VERSION_INT >= AV_VERSION_INT(57, 28, 100)
+        av_channel_layout_default(&pCodecContext->ch_layout, pCodecContext->ch_layout.nb_channels);
+        av_opt_set_chlayout(this->swr, "in_channel_layout",  &pCodecContext->ch_layout, 0);
+        av_opt_set_chlayout(this->swr, "out_channel_layout", &pCodecContext->ch_layout, 0);
+#else
         if (pCodecContext->channel_layout == 0) {
             pCodecContext->channel_layout = av_get_default_channel_layout( pCodecContext->channels );
         }
         av_opt_set_channel_layout(this->swr, "in_channel_layout",  pCodecContext->channel_layout, 0);
         av_opt_set_channel_layout(this->swr, "out_channel_layout", mono ? AV_CH_LAYOUT_MONO : pCodecContext->channel_layout,  0);
+#endif
         av_opt_set_int(this->swr, "in_sample_rate",     pCodecContext->sample_rate,                0);
         av_opt_set_int(this->swr, "out_sample_rate",    this->targetSampleRate,                0);
         av_opt_set_sample_fmt(this->swr, "in_sample_fmt",  pCodecContext->sample_fmt, 0);
diff --git a/src/video/ffmpeg/ffmpeg_common.h b/src/video/ffmpeg/ffmpeg_common.h
index b0b973f..f0f7316 100644
--- a/src/video/ffmpeg/ffmpeg_common.h
+++ b/src/video/ffmpeg/ffmpeg_common.h
@@ -21,6 +21,7 @@
 extern "C" {
 #endif
 #include <libavcodec/avcodec.h>
+#include <libavcodec/bsf.h>
 #include <libavformat/avformat.h>
 #include <libavformat/avio.h>
 #include <libavfilter/avfilter.h>
diff --git a/src/video/nvcodec/cuda_threaded_decoder.cc b/src/video/nvcodec/cuda_threaded_decoder.cc
index 62bc7ee..957a90d 100644
--- a/src/video/nvcodec/cuda_threaded_decoder.cc
+++ b/src/video/nvcodec/cuda_threaded_decoder.cc
@@ -17,7 +17,7 @@ namespace decord {
 namespace cuda {
 using namespace runtime;
 
-CUThreadedDecoder::CUThreadedDecoder(int device_id, AVCodecParameters *codecpar, AVInputFormat *iformat)
+CUThreadedDecoder::CUThreadedDecoder(int device_id, AVCodecParameters *codecpar, const AVInputFormat *iformat)
     : device_id_(device_id), stream_({device_id, false}), device_{}, ctx_{}, parser_{}, decoder_{},
     pkt_queue_{}, frame_queue_{},
     run_(false), frame_count_(0), draining_(false),
@@ -70,7 +70,7 @@ CUThreadedDecoder::CUThreadedDecoder(int device_id, AVCodecParameters *codecpar,
     }
 }
 
-void CUThreadedDecoder::InitBitStreamFilter(AVCodecParameters *codecpar, AVInputFormat *iformat) {
+void CUThreadedDecoder::InitBitStreamFilter(AVCodecParameters *codecpar, const AVInputFormat *iformat) {
     const char* bsf_name = nullptr;
     if (AV_CODEC_ID_H264 == codecpar->codec_id) {
         // H.264
diff --git a/src/video/nvcodec/cuda_threaded_decoder.h b/src/video/nvcodec/cuda_threaded_decoder.h
index d7e6fcd..61958a1 100644
--- a/src/video/nvcodec/cuda_threaded_decoder.h
+++ b/src/video/nvcodec/cuda_threaded_decoder.h
@@ -46,7 +46,7 @@ class CUThreadedDecoder final : public ThreadedDecoderInterface {
     using FrameOrderQueuePtr = std::unique_ptr<FrameOrderQueue>;
 
     public:
-        CUThreadedDecoder(int device_id, AVCodecParameters *codecpar, AVInputFormat *iformat);
+        CUThreadedDecoder(int device_id, AVCodecParameters *codecpar, const AVInputFormat *iformat);
         void SetCodecContext(AVCodecContext *dec_ctx, int width = -1, int height = -1, int rotation = 0);
         bool Initialized() const;
         void Start();
@@ -70,7 +70,7 @@ class CUThreadedDecoder final : public ThreadedDecoderInterface {
         void LaunchThreadImpl();
         void RecordInternalError(std::string message);
         void CheckErrorStatus();
-        void InitBitStreamFilter(AVCodecParameters *codecpar, AVInputFormat *iformat);
+        void InitBitStreamFilter(AVCodecParameters *codecpar, const AVInputFormat *iformat);

         int device_id_;
         CUStream stream_;
diff --git a/src/video/video_reader.cc b/src/video/video_reader.cc
index af4858d..99c9635 100644
--- a/src/video/video_reader.cc
+++ b/src/video/video_reader.cc
@@ -145,7 +145,7 @@ VideoReader::~VideoReader(){

 void VideoReader::SetVideoStream(int stream_nb) {
     if (!fmt_ctx_) return;
-    AVCodec *dec;
+    const AVCodec *dec;
     int st_nb = av_find_best_stream(fmt_ctx_.get(), AVMEDIA_TYPE_VIDEO, stream_nb, -1, &dec, 0);
     // LOG(INFO) << "find best stream: " << st_nb;
     CHECK_GE(st_nb, 0) << "ERROR cannot find video stream with wanted index: " << stream_nb;

编译Decord

mkdir build && cd build
cmake .. -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release
make

安装Decord:

cd ../python
python3 setup.py install --user

使用示例

import decord
import torch
from decord import gpu, cpu

video_path="input.mp4"
vr = decord.VideoReader(video_path, ctx=gpu(0))
nframes = 40
total_frames = 500
idx = torch.linspace(0, total_frames - 1, nframes).round().long().tolist()
video = vr.get_batch(idx).asnumpy()