FunModel model service

更新时间:
复制 MD 格式

FunModel is a full lifecycle management platform for AI model development, deployment, and Operations and Maintenance (O&M). You can provide a model file, such as one from a model repository like ModelScope or Hugging Face. Then, you can use FunModel's automated tools to quickly package and deploy the model service. You will obtain an inference API that you can call directly. The platform is designed to improve resource efficiency and simplify the development and deployment process.

Core capabilities and implementation

  1. Heterogeneous computing power virtualization

    FunModel uses heterogeneous computing power virtualization to centrally manage and schedule compute resources such as CPUs and GPUs in a data center. Its core mechanisms include the following:

    • GPU slicing technology: Virtualizes a single physical GPU into multiple independent computing units. This allows multiple models or instances of different sizes to share the same GPU while ensuring resource isolation.

    • Resource pooling management: Manages heterogeneous computing power, such as CPUs and GPUs, in a data center to form a unified resource pool. It dynamically schedules and allocates resources based on the actual payload.

    This architecture aims to increase the overall utilization of compute resources such as GPUs. This helps you optimize computing power costs.

  2. Load-aware scheduling and elastic scaling

    To handle common traffic fluctuations in AI inference services, FunModel provides a scheduling and instance recovery mechanism. This ensures service responsiveness and stability.

    • Three-level response mechanism:

      • Active instance priority: Requests are routed to active instances first for the lowest latency.

      • Shallow hibernation (formerly idle) instance wakeup: When there are not enough active instances, the system wakes up shallow hibernation instances from a "frozen" state using snapshot technology.

      • Cold start as a fallback: When no instances are available, the system performs a cold start to create a new instance.

    • Snapshot and state recovery: FunModel uses snapshot technology to freeze and store the complete state of an instance, including its GPU memory. When an instance needs to be woken up, the system can load the snapshot to recover the instance state in seconds. This greatly reduces the waiting time from instance creation to readiness.

    • Elastic scale-out in seconds: By combining the resource pool and snapshot recovery technology, FunModel can schedule and start new instances in seconds to handle sudden traffic peaks.

    • Elastic billing: To balance cost and response speed, the compute resources for shallow hibernation (formerly idle) instances in a "frozen" state are billed at a lower rate. For more information, see Billing overview.

  3. Integrated development toolchain: Accelerate model iteration and deployment

    FunModel provides a series of automated tools. These tools allow developers to focus on model development itself, rather than on complex deployment and O&M tasks.

    • DevPod integrated development environment: Provides a cloud-based development environment pre-configured with common AI frameworks and libraries. Developers can code and debug directly through a web-based VSCode, JupyterLab, or SSH terminal. This removes the need to configure complex development environments locally.

    • One-click build and deployment: After a developer completes model development and local testing in DevPod, they can use the platform's tools to trigger a one-click process. This process builds the image, pushes it to an image repository, and automatically deploys it to the target environment. The entire process from code completion to service launch is clear and efficient, significantly shortening the iteration cycle.

    • Built-in acceleration frameworks: The platform integrates mainstream industry inference acceleration frameworks such as vLLM and SGLang. You can choose to enable them during deployment. You can typically use these frameworks to improve model inference performance without modifying your code.

Technical advantages

Feature

FunModel implementation

Description

Resource utilization

Uses GPU virtualization and resource pooling technology.

This design allows multiple tasks to share underlying hardware resources. It aims to improve the overall efficiency of compute resource usage.

Instance readiness time

State recovery mechanism based on snapshot technology.

When an instance starts, it recovers its running state from a snapshot. This reduces the time from creation to readiness to a few seconds.

Elastic scale-out response

Combines a pre-warmed resource pool with rapid instance recovery capabilities.

When the payload increases, the system can quickly schedule and start new instances from the pre-warmed resource pool. This enables horizontal scaling in seconds.

Automated deployment time

Provides a one-click build and deployment process.

A standard deployment process, from code commit to service launch, is typically completed within 10 minutes.

Quick Start

  1. Deploy your first model service—Quick Start

  2. Advanced deployment solution—Custom model deployment

  3. Cloud-based AI development environment—DevPod development environment

  4. DeepSeek-OCR Quick Start guide