Video FFMpeg使用指南_PG1阿里云产品-阿里云帮助中心

说明

本文档基于PPU SDK 1.4 Release（截止24/12/05尚未发布）写作。
内置PPU SDK 1.4的AI容器镜像稍后发布，请联系产品团队获取。

FFmpeg是一个开源的多媒体框架，主要用于处理音频和视频数据。它能够执行各种多媒体操作，例如：

转换格式：FFmpeg支持几乎所有音频和视频格式之间的转换。例如，可以将视频从MP4格式转换为AVI格式，或者将音频从WAV格式转换为MP3格式。
录制和捕捉：FFmpeg可以用于录制音频和视频，或者从摄像头和屏幕捕捉实时视频流。
编辑：FFmpeg支持剪切、合并、添加水印、字幕等视频编辑功能，尽管它的主要用途是转换和流式传输。
流媒体：FFmpeg可以用于实时流媒体传输，支持创建和接收各种流媒体协议，如RTMP、HLS等。
解码和编码：FFmpeg具备高效的解码和编码能力，支持多种编解码器，帮助实现音视频的压缩和解压。

PPU SDK兼容支持Nvidia Video Codec SDK，所以能直接支持FFmpeg中的cuvid，nvenc和libnpp这些插件来支持硬件加速，不需要开发额外的插件。

已经验证的FFmpeg版本包括：

FFmpeg7.0.1

对应nv-codec-headers 12.2.72.0

Video Codec SDK需支持到12.2及以上版本

FFmpeg官方最新版本，大幅优化了transcode的并行化性能。

FFmpeg6.1.2

依赖nv-codec-headers 12.1.14.0

Video Codec SDK需支持到12.1及以上版本

PyAV等框架依赖这个版本。这也是目前使用最广泛的版本。

编译FFmpeg

FFmpeg：https://github.com/FFmpeg/FFmpeg

nv-codec-headers：https://github.com/FFmpeg/nv-codec-headers

注意FFmpeg版本跟nv-codec-headers版本之间对应匹配关系

下载：

## clone 12.2 版本的 nv-codec-headers 匹配7.0 ffmpeg
git clone --branch n12.2.72.0 --depth 1 https://github.com/FFmpeg/nv-codec-headers
cp nv-codec-headers/ffnvcodec.pc.in nv-codec-headers/ffnvcodec.pc
 
## down ffmpeg
wget https://ffmpeg.org/releases/ffmpeg-7.0.1.tar.gz --no-check-certificate
tar -xzvf ffmpeg-7.0.1.tar.gz

补丁，修改nv-codec-headers/include/ffnvcodec/dylink_loader.h，删除不支持（也不需要）的CUDA API：

    // tcuGLGetDevices_v2 *cuGLGetDevices;
    // tcuGraphicsGLRegisterImage *cuGraphicsGLRegisterImage;
    // tcuGraphicsUnregisterResource *cuGraphicsUnregisterResource;
    // tcuGraphicsMapResources *cuGraphicsMapResources;
    // tcuGraphicsUnmapResources *cuGraphicsUnmapResources;
    // tcuGraphicsSubResourceGetMappedArray *cuGraphicsSubResourceGetMappedArray;
    // tcuGraphicsResourceGetMappedPointer *cuGraphicsResourceGetMappedPointer;

    ... ...

    // tcuArrayCreate *cuArrayCreate;
    // tcuArray3DCreate *cuArray3DCreate;
    // tcuArrayDestroy *cuArrayDestroy;

    // tcuEGLStreamProducerConnect *cuEGLStreamProducerConnect;
    // tcuEGLStreamProducerDisconnect *cuEGLStreamProducerDisconnect;
    // tcuEGLStreamConsumerDisconnect *cuEGLStreamConsumerDisconnect;
    // tcuEGLStreamProducerPresentFrame *cuEGLStreamProducerPresentFrame;
    // tcuEGLStreamProducerReturnFrame *cuEGLStreamProducerReturnFrame;

    ... ...
    // LOAD_SYMBOL(cuDevicePrimaryCtxRelease, tcuDevicePrimaryCtxRelease, "cuDevicePrimaryCtxRelease");
    LOAD_SYMBOL(cuDevicePrimaryCtxRelease, tcuDevicePrimaryCtxRelease, "cuDevicePrimaryCtxRelease_v2");
    // LOAD_SYMBOL(cuDevicePrimaryCtxSetFlags, tcuDevicePrimaryCtxSetFlags, "cuDevicePrimaryCtxSetFlags");
    LOAD_SYMBOL(cuDevicePrimaryCtxSetFlags, tcuDevicePrimaryCtxSetFlags, "cuDevicePrimaryCtxSetFlags_v2");
    
    ... ...
    // LOAD_SYMBOL(cuDevicePrimaryCtxReset, tcuDevicePrimaryCtxReset, "cuDevicePrimaryCtxReset");
    LOAD_SYMBOL(cuDevicePrimaryCtxReset, tcuDevicePrimaryCtxReset, "cuDevicePrimaryCtxReset_v2");

    ... ... 

    // LOAD_SYMBOL(cuLinkCreate, tcuLinkCreate, "cuLinkCreate");
    LOAD_SYMBOL(cuLinkCreate, tcuLinkCreate, "cuLinkCreate_v2");
    // LOAD_SYMBOL(cuLinkAddData, tcuLinkAddData, "cuLinkAddData");
    LOAD_SYMBOL(cuLinkAddData, tcuLinkAddData, "cuLinkAddData_v2");

    ... ... 
        
    // LOAD_SYMBOL(cuModuleGetGlobal, tcuModuleGetGlobal, "cuModuleGetGlobal");
    LOAD_SYMBOL(cuModuleGetGlobal, tcuModuleGetGlobal, "cuModuleGetGlobal_v2");
    
    ... ...

    // LOAD_SYMBOL(cuGLGetDevices, tcuGLGetDevices_v2, "cuGLGetDevices_v2");
    // LOAD_SYMBOL(cuGraphicsGLRegisterImage, tcuGraphicsGLRegisterImage, "cuGraphicsGLRegisterImage");
    // LOAD_SYMBOL(cuGraphicsUnregisterResource, tcuGraphicsUnregisterResource, "cuGraphicsUnregisterResource");
    // LOAD_SYMBOL(cuGraphicsMapResources, tcuGraphicsMapResources, "cuGraphicsMapResources");
    // LOAD_SYMBOL(cuGraphicsUnmapResources, tcuGraphicsUnmapResources, "cuGraphicsUnmapResources");
    // LOAD_SYMBOL(cuGraphicsSubResourceGetMappedArray, tcuGraphicsSubResourceGetMappedArray, "cuGraphicsSubResourceGetMappedArray");
    // LOAD_SYMBOL(cuGraphicsResourceGetMappedPointer, tcuGraphicsResourceGetMappedPointer, "cuGraphicsResourceGetMappedPointer_v2");

    ... ...

    // LOAD_SYMBOL(cuArrayCreate, tcuArrayCreate, "cuArrayCreate_v2");
    // LOAD_SYMBOL(cuArray3DCreate, tcuArray3DCreate, "cuArray3DCreate_v2");
    // LOAD_SYMBOL(cuArrayDestroy, tcuArrayDestroy, "cuArrayDestroy");

    // LOAD_SYMBOL_OPT(cuEGLStreamProducerConnect, tcuEGLStreamProducerConnect, "cuEGLStreamProducerConnect");
    // LOAD_SYMBOL_OPT(cuEGLStreamProducerDisconnect, tcuEGLStreamProducerDisconnect, "cuEGLStreamProducerDisconnect");
    // LOAD_SYMBOL_OPT(cuEGLStreamConsumerDisconnect, tcuEGLStreamConsumerDisconnect, "cuEGLStreamConsumerDisconnect");
    // LOAD_SYMBOL_OPT(cuEGLStreamProducerPresentFrame, tcuEGLStreamProducerPresentFrame, "cuEGLStreamProducerPresentFrame");
    // LOAD_SYMBOL_OPT(cuEGLStreamProducerReturnFrame, tcuEGLStreamProducerReturnFrame, "cuEGLStreamProducerReturnFrame");

编译：

mkdir output
cd ffmpeg-7.0.1
export PKG_CONFIG_PATH=$(pwd)/../nv-codec-headers:$PKG_CONFIG_PATH
./configure --enable-shared \
    --enable-nonfree --enable-gpl \
    --enable-cuvid --enable-nvenc --enable-libnpp \
    --extra-cflags="-I$(pwd)/../nv-codec-headers/include -I$CUDA_HOME/include" \
    --extra-ldflags="-L$CUDA_HOME/lib64" \
    --prefix=$(pwd)/../output
 
make -j && make install

--prefix指向编译成功后的安装目录，如果不设置，默认安装到/usr/local目录。

编译安装成功后，在安装目录下会生成bin, include, lib, share。

执行示例：

cd ../output
export LD_LIBRARY_PATH=lib:$LD_LIBRARY_PATH

# ffmpeg decode test with cuvid
bin/ffmpeg -vcodec h264_cuvid -i 640x360_y8c8.h264 -frames 10 -y out-dec-h264.yuv

# ffmpeg transcode test with cuvid and nvenc
bin/ffmpeg -hwaccel_output_format cuda -vcodec h264_cuvid -i 640x360_y8c8.h264 -vcodec hevc_nvenc -r 30.0 -b:v 15000000 -preset p4 output.hevc

# ffmpeg transcode test with cuvid and nvenc, resize with npp support
bin/ffmpeg -hwaccel_output_format cuda -vcodec h264_cuvid -i input.h264 -vcodec hevc_nvenc -frames 300 -preset p7 -acodec copy -y output.h265

FAQ

Q. 目前ffmpeg可以支持哪些硬件加速能力？

A. 解码可以支持h264_cuvid，hevc_cuvid，av1_cuvid和vp9_cuvid硬件解码（如果有avs2的需求，可以申请额外补丁包）；编码可以支持h264_nvenc，hevc_nvenc, av1_nvenc硬件加速；图像处理可以支持transpose，color-conversion，scale等npp加速。

附录A：FFmpeg-6.1安装使用

ffmpeg7在transcode/multistream并行优化上做了很大的架构改动，带来了明显的性能提升，但因为兼容性的原因有很多框架仍然需要ffmpeg 6.1版本：

下载FFmpeg-6.1代码以及对应的nv-codec-headers：

## clone 12.0 版本的 nv-codec-headers 匹配6.0 ffmpeg
git clone --branch n12.1.14.0 --depth 1 https://github.com/FFmpeg/nv-codec-headers
cp nv-codec-headers/ffnvcodec.pc.in nv-codec-headers/ffnvcodec.pc
    
## down ffmpeg
wget https://ffmpeg.org/releases/ffmpeg-6.1.2.tar.gz --no-check-certificate
tar -xzvf ffmpeg-6.1.2.tar.gz

后续补丁，编译和执行方法与上文FFmpeg7的方法相同。

如果依赖更早的FFmpeg版本，建议升级支持到6.1以上。

附录B：FFmpeg生态

PyAV

Torchvision支持PyAV，video_reader和cuda这三个backend，默认的backend是PyAV。

Video reader是纯CPU解码，cuda使用GPU decoder解码，PyAV调用FFmpeg来完成解码。

所以PyAV实际是依赖FFmpeg本身支持的硬件加速能力，需要编译安装支持硬件加速的FFmpeg

到目前为止，PyAV还不支持FFmpeg7.0，可编译附录A的FFmpeg-6.1版本来支持

PyAV官网：https://github.com/PyAV-Org/PyAV

调用路径1：pytorch->torchvision->PyAV->ffmpeg：

硬件解码支持（需要修改torchvision的代码）：

--- a/torchvision/io/video_reader.py
+++ b/torchvision/io/video_reader.py
@@ -280,6 +280,13 @@ class VideoReader:
         if self.backend == "pyav":
             stream_type = stream.split(":")[0]
             stream_id = 0 if len(stream.split(":")) == 1 else int(stream.split(":")[1])
+            video_stream = self.container.streams.video[stream_id]
+            # Setting up the codec with cuvid
+            if video_stream.codec.name in ('h264', 'hevc', 'av1', 'vp9'):
+                codec_name = f'{video_stream.codec.name}_cuvid'
+            else:
+                codec_name = video_stream.codec.name  # Fallback to software decoding
+            video_stream.codec_context = av.codec.CodecContext.create(codec_name, 'r')
             self.pyav_stream = {stream_type: stream_id}
             self._c = self.container.decode(**self.pyav_stream)

PyAV输出的数据默认是TCHW格式。

硬件编码支持（不需要修改torchvision代码，但用户调用时指定使用nvenc名称）：

from torchvision.io import write_video
write_video(save_path, x, fps=fps, video_codec="h264_nvenc")  # 这里需要明确指定，目前我们支持h264_nvenc，av1_nvenc, hevc_nvenc

调用路径2：pytorch-> torchaudio-> pyAV -> ffmpeg:

Torchaudio也支持pyAV作为banckend实现硬件解码，不需要修改torchaudio代码，但需要明确指定cuvid的名称：

import torch
import torchaudio

from torchaudio.io import StreamReader
from torchaudio.utils import ffmpeg_utils

s = StreamReader(src)
s.add_video_stream(int(s.get_src_stream_info(0).frame_rate), decoder="h264_cuvid") # 这里需要明确指定cuvid，目前我们支持h264_cuvid，av1_cuvid, hevc_cuvid, vp9_cuvid
s.fill_buffer() # 这里需要明确指定cuvid，目前我们支持h264_cuvid，av1_cuvid, hevc_cuvid, vp9_cuvid
(video,) = s.pop_chunks()

ffmpeg-python

官网：https://github.com/kkroening/ffmpeg-python

ffmpeg-python实质是把你输入的参数拼合之后，调用ffmpeg程序来执行，所以需要你的运行环境有安装ffmpeg，或者设置了ffmpeg所在的bin和lib目录。

因为PPU Video 解码只支持cuvid decode模式，所以解码时，只配置hwaccel='cuda'是不行的，需要配置cuvid参数。

参考示例如下：

import ffmpeg
import sys

probe = ffmpeg.probe('input.mp4')
video_stream = next((stream for stream in probe['streams'] if stream['codec_type'] == 'video'), None)

if video_stream is None or video_stream['codec_name'] is None:
    sys.exit()
    
if video_stream['codec_name'] in ['h264', 'hevc', 'av1', 'vp9']:
    out, _ = (
        ffmpeg.input(
            'input.mp4',
            vcodec = video_stream['codec_name']+'_cuvid'
        )
        .output('output.rgb', format='rawvideo', pix_fmt='rgb24')
        .run()
    )
else:
    out, _ = (
        ffmpeg.input(
            'input.mp4'
        )
        .output('output.rgb', format='rawvideo', pix_fmt='rgb24')
        .run()

imageio-ffmpeg

imageio-ffmpeg 是 imageio 的一个插件，它提供对 FFmpeg 的封装。

imageio: https://github.com/imageio/imageio

imageio-ffmpeg: https://github.com/imageio/imageio-ffmpeg

imageio-ffmpeg依赖ffmpeg提供硬件加速能力，所以需要编译安装支持硬件加速的FFmpeg。

使用示例：

import imageio

video_reader = imageio.get_reader('input.mp4', 'ffmpeg', ffmpeg_params=['-c:v', 'h264_cuvid'])
frame_list = []
for i, frame in enumerate(video_reader):
    frame_list.append(frame)
imageio.mimsave('out.mp4', frame_list)

上一篇: 已知问题下一篇: Video Codec/Image使用指南