What is FunModel

更新时间:
复制 MD 格式

FunModel is a full-lifecycle management platform for AI model development, deployment, and Operations and Maintenance (O&M). You can provide a model file from a community repository, such as ModelScope or Hugging Face, and use FunModel's automated tools to quickly package and deploy the model service. This provides an inference API that you can call directly. The platform is designed to improve resource efficiency and simplify the development and deployment process.

Core capabilities and principles

  1. Heterogeneous computing power virtualization

    FunModel uses heterogeneous computing power virtualization to manage and schedule compute resources, such as CPUs and GPUs, in a data center. Its core mechanisms include the following:

    • GPU splitting: Virtualizes a single physical GPU into multiple independent compute units. This allows multiple models or instances of different sizes to share the same GPU card while ensuring resource isolation.

    • Resource pooling: Manages heterogeneous computing power, such as CPUs and GPUs, in a unified resource pool. The platform dynamically schedules and allocates resources based on the actual load.

    This architecture improves the overall utilization of compute resources, such as GPUs, and helps you optimize your computing power costs.

  2. Load-aware scheduling and elastic scaling

    FunModel has a scheduling and instance recovery mechanism to handle common traffic fluctuations in AI inference services. This mechanism ensures service responsiveness and stability.

    • Three-level response mechanism:

      • Prioritize active instances: Requests are routed to active instances first for the lowest latency.

      • Wake up shallow-hibernation instances: If there are not enough active instances, the system uses snapshot technology to wake up instances from a 'frozen' shallow-hibernation state.

      • Cold start as a fallback: If no instances are available, the system performs a cold start to create a new instance.

    • Snapshots and state recovery: FunModel uses snapshot technology to freeze and store the complete state of an instance, including its GPU memory. When an instance needs to be woken up, the system loads the snapshot to restore its state within seconds. This process greatly reduces the waiting time for an instance to become ready.

    • Elastic scaling in seconds: By combining resource pools with snapshot recovery, FunModel can schedule and start new instances in seconds to handle sudden traffic spikes.

    • Elastic billing: To balance cost and response speed, compute resources for shallow-hibernation instances in a 'frozen' state are billed at a lower rate. For more information, see Billing overview.

  3. Integrated development toolchain: Accelerate model iteration and deployment

    FunModel provides a series of automated tools. These tools allow developers to focus on model development instead of complex deployment and O&M tasks.

    • DevPod integrated development environment: Provides a cloud-based development environment with pre-installed AI frameworks and libraries. You can code and debug directly through a web-based VSCode, JupyterLab, or SSH terminal. This eliminates the need to configure a complex local development environment.

    • One-click build and deployment: After you finish developing and testing your model in DevPod, you can use the platform's tools to trigger a one-click process. This process builds the image, pushes it to the image repository, and automatically deploys it to the target environment. The entire flow from code completion to a published service is clear and efficient, which significantly shortens the iteration cycle.

    • Built-in acceleration frameworks: The platform integrates mainstream inference acceleration frameworks, such as vLLM and SGLang. You can enable these frameworks during deployment to improve model inference performance, typically without changing your code.

Technical advantages

Feature

FunModel implementation

Description

Resource utilization

Uses GPU virtualization and resource pooling technologies.

This design allows multiple tasks to share the underlying hardware, which improves the overall efficiency of compute resources.

Instance readiness time

A state recovery mechanism based on snapshot technology.

When an instance starts, its state can be restored from a snapshot in milliseconds. This process reduces the time from creation to readiness to just a few seconds.

Elastic scaling response

Combines a pre-warmed resource pool with rapid instance recovery capabilities.

When the load increases, the system can quickly schedule and start new instances from a pre-warmed resource pool. This provides a horizontal scaling response within seconds.

Automated deployment time

Provides a one-click build and deployment process.

A standard deployment process, from code commit to a published service, is typically completed within 10 minutes.

Quick start

  1. Deploy your first model service: Quick Start

  2. Advanced deployment solution: Custom model deployment

  3. Cloud AI development environment: DevPod development environment

  4. DeepSeek-OCR Quick Start guide