Model overview
On December 1, 2025, DeepSeek open-sourced two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which offer leading inference capabilities.
The two models target different use cases. DeepSeek-V3.2 aims to balance inference capabilities with output length, making it suitable for common use cases such as question answering and general agent tasks. This is the official release following the experimental V3.2-Exp version launched at the end of September. In public inference benchmarks, V3.2 achieves performance comparable to GPT-5 and is only slightly behind Google's Gemini3 Pro.
The goal of DeepSeek-V3.2-Speciale is to "push the inference capabilities of open-source models to the extreme and explore the boundaries of model capabilities." Speciale is described as an enhanced version of V3.2 with long-thought capabilities, combined with the theorem-proving abilities of DeepSeek-Math-V2. This model features excellent instruction-following, rigorous mathematical proof, and logical verification abilities. According to data released by DeepSeek, Speciale surpasses Google's most advanced Gemini3 Pro on several inference benchmarks.
With 671 billion parameters, deploying the DeepSeek-V3.2 model locally is challenging. This makes cloud deployment the preferred option for enterprise users and developers.
Alibaba Cloud PAI-Model Gallery now supports the DeepSeek-V3.2 and DeepSeek-V3.2-Speciale models and provides enterprise-grade deployment solutions.
Procedure
-
To get started, find the DeepSeek-V3.2 model in PAI-Model Gallery, which you can access directly by using this link: https://pai.console.aliyun.com/#/quick-start/models/DeepSeek-V3.2/intro. In the PAI console, go to the Model Gallery page and select the Model Gallery tab. Find your target model, such as DeepSeek-V3.2 or DeepSeek-V3.2-Speciale, and click its card to go to the model details page.
-
On the model details page, click Deploy in the upper-right corner. The platform supports SGLang and vLLM deployment frameworks. The platform provides multiple deployment templates with default configurations. To deploy the model with one click, select a template and the desired computing resources.
NoteThe resource requirements for deployment are as follows:
-
Deployment with a public resource
When using a public resource, you are limited to the listed specifications. In a Lingjun distributed deployment, the service uses the same resources to deploy multiple nodes. -
Deployment with an EAS resource group or resource quota
A standalone deployment requires a computing resource with eight GPUs, each providing 141 GB of GPU memory. For a Lingjun distributed deployment, the required number of nodes is pre-configured, and each node must be configured with eight GPUs.
Set Inference Engine to SGLang and Deployment Template to standalone. Keep Enable Dynamic Scheduling disabled and set Enable PD Caching to Do not enable. After you confirm the resource costs, click Deploy.
-
-
After the deployment succeeds, go to the service page and click View Invocation Information to get the service endpoint and token. For service invocation instructions, click the pre-trained model link to return to the model details page.
-
You can call the model service from your local environment or other clients, use the online debugging feature provided by PAI, or interact with the model through the PAI-provided WebUI. For more information, see Invoke Model.
Support for more models
PAI-Model Gallery enables rapid deployment, fine-tuning, distillation, and evaluation for popular models from the open-source community. Supported models include popular open-source models such as Qwen, Wan, DeepSeek, Kimi, MiniMax, and GLM. It also offers PAI-optimized model versions, such as Qwen3-235B-A22B-PAI-optimized, Qwen3-Next-80B-A3B-Instruct-FP8-PAI-optimized, and DeepSeek-R1-0528-PAI-optimized. These versions come with built-in templates, such as the PAI-optimized EP+PD Caching deployment template, for improved performance.