Qwen镜像ReleaseNotes_Alibaba Cloud Linux(Alinux)-阿里云帮助中心

Qwen系列大模型镜像是AC2推出的开箱即用容器部署服务。容器镜像包含了运行Qwen系列大模型所需的所有依赖，包括Python运行环境、深度学习框架以及依赖库。确保Qwen系列大模型能够高效、稳定地在不同环境下部署和服务。本系列镜像不包含大模型权重文件，需用户自行下载，或使用镜像提供的下载能力下载。Qwen系列大模型镜像通过Web Demo的形式对外提供服务，也可以通过将本系列镜像作为基础镜像，定制不同形式的服务提供方式。

镜像列表

Qwen系列大模型镜像分为「一键部署镜像」和「运行环境镜像」，具体区别如下：

运行环境镜像：仅包含运行该模型所需要的所有软件环境，包括系统组件以及Python依赖。
一键部署镜像：包含运行环境、启动脚本以及Web Demo脚本。

镜像类型	支持大模型	CPU/GPU	镜像地址
运行环境镜像	Qwen 1.8-72B（支持量化模型）	GPU	ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen:runtime-pytorch2.2.0.1-cuda12.1.1-alinux3.2304
运行环境镜像	Qwen 1.8-72B	CPU	ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen:runtime-pytorch2.2.0.1-alinux3.2304
一键部署镜像	Qwen-Chat-7B	GPU	ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen:7b-pytorch2.2.0.1-cuda12.1.1-alinux3.2304
一键部署镜像	Qwen-Chat-7B	CPU	ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/qwen:7b-pytorch2.2.0.1-alinux3.2304

镜像内容

qwen:runtime-pytorch2.2.0.1-cuda12.1.1-alinux3.2304
- gradio: 3.41.0
- optimum: 1.19.2
- auto-gptq: 0.7.1
- flash-attn: 2.5.8
- tiktoken: 0.5.2
- accelerate: 0.26.1
- transformers: 4.36.2
- PyTorch: 2.2.0.1
- CUDA：12.1.1
- Python: 3.10.13
- BaseOS: Alinux 3.2304
qwen:runtime-pytorch2.2.0.1-alinux3.2304
- gradio: 3.41.0
- tiktoken: 0.5.2
- accelerate: 0.26.1
- transformers: 4.36.2
- PyTorch: 2.2.0.1
- Python: 3.10.13
- BaseOS: Alinux 3.2304
qwen:7b-pytorch2.2.0.1-cuda12.1.1-alinux3.2304
组件继承自qwen:runtime-pytorch2.2.0.1-cuda12.1.1-alinux3.2304
qwen:7b-pytorch2.2.0.1-alinux3.2304
组件继承自qwen:runtime-pytorch2.2.0.1-alinux3.2304

镜像运行要求

qwen:runtime-pytorch2.2.0.1-alinux3.2304以及qwen:7b-pytorch2.2.0.1-alinux3.2304为CPU镜像，对驱动无要求。
qwen:runtime-pytorch2.2.0.1-cuda12.1.1-alinux3.2304以及qwen:7b-pytorch2.2.0.1-cuda12.1.1-alinux3.2304为GPU镜像，包含CUDA 12.1.1，需要nvidia-driver >= 530，兼容nvidia-driver R470、R525。

GPU兼容性说明

GPU镜像中集成了FlashAttention-2，该组件对GPU架构有兼容性要求。下表整理了FlashAttention-2对不同GPU架构的支持情况，以及对应的阿里云在售异构实例的GPU型号。

GPU架构	阿里云在售	FlashAttention-2
Ampere	A10	支持
Turing	T4	不支持
Volta	V100
Pascal	P100、P4

在不支持的GPU架构上运行Qwen大模型，可能会出现「FlashAttention only supports Ampere GPUs or newer」的错误提示。可以在运行容器中通过以下命令移除FlashAttention-2组件，防止Qwen大模型在不支持的GPU设备上使用FlashAttention-2加速。

pip uninstall -y flash-attn

重要特性

GPU镜像预装optimum、auto-gptq、flash-attn，支持量化模型。
部署镜像提供一键部署能力，内置运行脚本（脚本来源）。

更新记录

2024.06
发布qwen镜像运行环境镜像
2024.07
发布qwen镜像一键部署镜像