在PAI上运行CogVideo模型
更新时间:
复制为 MD 格式
基础环境概述
机型:真武810E 16卡
存储:智算CPFS
镜像:dsw-registry-vpc.cn-wulanchabu.cr.aliyuncs.com/pai/training-xpu-pytorch:pai-ppu-llm-1.0
使用产品组件:PAI-DSW
环境构建
构建DSW:基于上述步骤中的镜像构建DSW实例,详细步骤请参考在PAI上使用PPU开发训练模型。

clone代码仓库:登录DSW,
git clone。
安装依赖包
需要修改仓库中自带的requirements.txt,去掉transformers,torch和torchvision依赖包,使用PPU镜像中自带的版本,得到如下依赖包列表。
diffusers>=0.30.3 accelerate>=0.34.2 numpy==1.26.0 sentencepiece>=0.2.0 SwissArmyTransformer>=0.4.12 gradio>=4.44.0 imageio>=2.35.1 imageio-ffmpeg>=0.5.1 openai>=1.45.0 moviepy>=1.0.3 pillow==9.5.0 scikit-video随后执行
pip install -r requirements.txt
下载预训练模型权重
到modelscope下载预训练模型权重文件,为后续的推理和finetune做准备。
cd /path/to/your/models/sora/thu/CogVideo/models # 此处替换为您的实际路径 pip install modelscope modelscope download --model ZhipuAI/CogVideoX-5b --local_dir ./CogVideoX-5b
模型训练
训练数据下载
采用Wild-Heart/Disney-VideoGeneration-Dataset。
cd /path/to/your/models/sora/thu/CogVideo/datasets # 此处替换为您的实际路径 git lfs install git clone https://hf-mirror.com/datasets/Wild-Heart/Disney-VideoGeneration-Dataset基线模型下载
cd /path/to/your/models/sora/thu/CogVideo/models # 此处替换为您的实际路径 modelscope download --model ZhipuAI/CogVideoX-2b --local_dir ./CogVideoX-2b训练代码改写
源码无需更改,只需更改训练脚本和相应配置文件中的几行代码,几乎无适配effort。
#!/bin/bash #环境变量按需更改,主要原因是hf访问不稳定,直接使用离线模式 # 此处替换为您的实际路径 export MODEL_PATH="/path/to/your/models/sora/thu/CogVideo/models/CogVideoX-2b" export CACHE_PATH="~/.cache" # 此处替换为您的实际路径 export DATASET_PATH="/path/to/your/models/sora/thu/CogVideo/datasets/Disney-VideoGeneration-Dataset/" export OUTPUT_PATH="cogvideox-lora-single-node" #此行需要暂时注释 # export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True # if you are not using wth 8 gus, change `accelerate_config_machine_single.yaml` num_processes as your gpu number accelerate launch --config_file accelerate_config_machine_single.yaml --multi_gpu \ train_cogvideox_lora.py \ --gradient_checkpointing \ --pretrained_model_name_or_path $MODEL_PATH \ --cache_dir $CACHE_PATH \ --enable_tiling \ --enable_slicing \ --instance_data_root $DATASET_PATH \ --caption_column prompt.txt \ --video_column videos.txt \ --validation_prompt "DISNEY A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions:::A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance" \ --validation_prompt_separator ::: \ --num_validation_videos 1 \ --validation_epochs 100 \ --seed 42 \ --rank 128 \ --lora_alpha 64 \ --mixed_precision bf16 \ --output_dir $OUTPUT_PATH \ --height 480 \ --width 720 \ --fps 8 \ --max_num_frames 49 \ --skip_frames_start 0 \ --skip_frames_end 0 \ --train_batch_size 1 \ --num_train_epochs 30 \ --checkpointing_steps 1000 \ --gradient_accumulation_steps 1 \ --learning_rate 1e-3 \ --lr_scheduler cosine_with_restarts \ --lr_warmup_steps 200 \ --lr_num_cycles 1 \ --enable_slicing \ --enable_tiling \ --gradient_checkpointing \ --optimizer AdamW \ --adam_beta1 0.9 \ --adam_beta2 0.95 \ --max_grad_norm 1.0 \ --allow_tf32 \ --report_to wandbcompute_environment: LOCAL_MACHINE debug: false deepspeed_config: gradient_accumulation_steps: 1 gradient_clipping: 1.0 offload_optimizer_device: none offload_param_device: none zero3_init_flag: false zero_stage: 2 distributed_type: DEEPSPEED downcast_bf16: 'no' enable_cpu_affinity: false machine_rank: 0 main_training_function: main dynamo_backend: 'no' mixed_precision: 'no' num_machines: 1 num_processes: 16 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false执行训练
cd /path/to/your/models/sora/thu/CogVideo/finetune # 此处替换为您的实际路径 sh finetune_single_rank.sh

模型推理
运行如下命令执行模型推理。
python cli_demo.py --prompt "A serene night scene in a forested area. The first frame shows a tranquil lake reflecting the star-filled sky above. The second frame reveals a beautiful sunset, casting a warm glow over the landscape. The third frame showcases the night sky, filled with stars and a vibrant Milky Way galaxy. The video is a time-lapse, capturing the transition from day to night, with the lake and forest serving as a constant backdrop. The style of the video is naturalistic, emphasizing the beauty of the night sky and the peacefulness of the forest." --model_path ../models/CogVideoX-5b/ --generate_type "t2v" --output_path ./test1.mp4 --num_inference_steps 20将会得到类似如下视频生成文件。
该文章对您有帮助吗?
