FunModel is a full lifecycle AI model management platform that uses heterogeneous computing virtualization, load-aware scheduling, and an automated toolchain for efficient resource use and fast model deployment.-Cloud Application Platform(CAP)-阿里云帮助中心

FunModel is a full lifecycle management platform for AI model development, deployment, and Operations and Maintenance (O&M). You can provide a model file, such as one from a model repository like ModelScope or Hugging Face. Then, you can use FunModel's automated tools to quickly package and deploy the model service. You will obtain an inference API that you can call directly. The platform is designed to improve resource efficiency and simplify the development and deployment process.

Core capabilities and implementation

Heterogeneous computing power virtualization
FunModel uses heterogeneous computing power virtualization to centrally manage and schedule compute resources such as CPUs and GPUs in a data center. Its core mechanisms include the following:
- GPU slicing technology: Virtualizes a single physical GPU into multiple independent computing units. This allows multiple models or instances of different sizes to share the same GPU while ensuring resource isolation.
- Resource pooling management: Manages heterogeneous computing power, such as CPUs and GPUs, in a data center to form a unified resource pool. It dynamically schedules and allocates resources based on the actual payload.
This architecture aims to increase the overall utilization of compute resources such as GPUs. This helps you optimize computing power costs.
Load-aware scheduling and elastic scaling
To handle common traffic fluctuations in AI inference services, FunModel provides a scheduling and instance recovery mechanism. This ensures service responsiveness and stability.
- Three-level response mechanism:
  - Active instance priority: Requests are routed to active instances first for the lowest latency.
  - Shallow hibernation (formerly idle) instance wakeup: When there are not enough active instances, the system wakes up shallow hibernation instances from a "frozen" state using snapshot technology.
  - Cold start as a fallback: When no instances are available, the system performs a cold start to create a new instance.
- Snapshot and state recovery: FunModel uses snapshot technology to freeze and store the complete state of an instance, including its GPU memory. When an instance needs to be woken up, the system can load the snapshot to recover the instance state in seconds. This greatly reduces the waiting time from instance creation to readiness.
- Elastic scale-out in seconds: By combining the resource pool and snapshot recovery technology, FunModel can schedule and start new instances in seconds to handle sudden traffic peaks.
- Elastic billing: To balance cost and response speed, the compute resources for shallow hibernation (formerly idle) instances in a "frozen" state are billed at a lower rate. For more information, see Billing overview.
Integrated development toolchain: Accelerate model iteration and deployment
FunModel provides a series of automated tools. These tools allow developers to focus on model development itself, rather than on complex deployment and O&M tasks.
- DevPod integrated development environment: Provides a cloud-based development environment pre-configured with common AI frameworks and libraries. Developers can code and debug directly through a web-based VSCode, JupyterLab, or SSH terminal. This removes the need to configure complex development environments locally.
- One-click build and deployment: After a developer completes model development and local testing in DevPod, they can use the platform's tools to trigger a one-click process. This process builds the image, pushes it to an image repository, and automatically deploys it to the target environment. The entire process from code completion to service launch is clear and efficient, significantly shortening the iteration cycle.
- Built-in acceleration frameworks: The platform integrates mainstream industry inference acceleration frameworks such as vLLM and SGLang. You can choose to enable them during deployment. You can typically use these frameworks to improve model inference performance without modifying your code.

Technical advantages

Feature	FunModel implementation	Description
Resource utilization	Uses GPU virtualization and resource pooling technology.	This design allows multiple tasks to share underlying hardware resources. It aims to improve the overall efficiency of compute resource usage.
Instance readiness time	State recovery mechanism based on snapshot technology.	When an instance starts, it recovers its running state from a snapshot. This reduces the time from creation to readiness to a few seconds.
Elastic scale-out response	Combines a pre-warmed resource pool with rapid instance recovery capabilities.	When the payload increases, the system can quickly schedule and start new instances from the pre-warmed resource pool. This enables horizontal scaling in seconds.
Automated deployment time	Provides a one-click build and deployment process.	A standard deployment process, from code commit to service launch, is typically completed within 10 minutes.

Quick Start

Deploy your first model service—Quick Start
Advanced deployment solution—Custom model deployment
Cloud-based AI development environment—DevPod development environment
DeepSeek-OCR Quick Start guide

Core capabilities and implementation

Heterogeneous computing power virtualization

Load-aware scheduling and elastic scaling

Integrated development toolchain: Accelerate model iteration and deployment

Technical advantages

Quick Start