Best practices-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

Preprocess video files to improve file transcription efficiency (for audio file recognition scenarios)

Paraformer speech recognition API is compatible with video files, but they are typically large and time-consuming to transfer. Pre-process video files by extracting the audio track needed for speech recognition and compressing it to significantly reduce file size and improve transcription throughput. Use ffmpeg for pre-processing.

Prerequisites

Install ffmpeg from ffmpeg.org.

Pre-process video files

Use ffmpeg to extract the first audio track, downsample to 16kHz, and compress with opus encoding.

Shell

ffmpeg -i input-video-file -ac 1 -ar 16000 -acodec libopus output-audio-file.opus

The output audio file will be significantly smaller than the input video. Submit the audio file (via URL) to the file transcription API for speech recognition results.