Accelerate model deployment with OSS Connector for AI/ML

更新时间:
复制 MD 格式

OSS Connector for AI/ML accelerates model loading from OSS through `LD_PRELOAD`-based direct reads with prefetching and caching. No code changes required. Compatible with containers and mainstream inference frameworks.

High performance

OSS Connector for AI/ML significantly improves large model loading from OSS. With sufficient bandwidth, throughput can exceed 10 GB/s. Performance testing.

How it works

OSS Connector for AI/ML addresses performance bottlenecks when loading large models from OSS.

  • FUSE-based mount solutions often cannot fully utilize OSS bandwidth, causing slow model loading. OSS Connector intercepts I/O requests from the inference framework and converts them directly into HTTP(s) requests to OSS.

  • Using `LD_PRELOAD`, it prefetches and caches model data in memory — no code changes to your inference application required.

Deployment environment

  • Operating system: Linux x86-64

  • glibc: >=2.17

Install OSS Connector

  1. Download the installation package.

    • oss-connector-lib-1.1.0rc7.x86_64.rpm: For Red Hat-based Linux distributions

      https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.rpm
    • oss-connector-lib-1.1.0rc7.x86_64.deb: For Debian-based Linux distributions

      https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.deb
  2. Install OSS Connector.

    Install using the downloaded .rpm or .deb package. The library `libossc_preload.so` is installed to /usr/local/lib/.

    • Install oss-connector-lib-1.1.0rc7.x86_64.rpm

      yum install -y oss-connector-lib-1.1.0rc7.x86_64.rpm
    • Install oss-connector-lib-1.1.0rc7.x86_64.deb

      dpkg -i oss-connector-lib-1.1.0rc7.x86_64.deb
  3. Verify that `/usr/local/lib/libossc_preload.so` exists and the version is correct.

    nm -D /usr/local/lib/libossc_preload.so | grep version

Configure OSS Connector

  • Configuration file

    The configuration file controls log output, cache policy, and prefetch concurrency.

    Default configuration at /etc/oss-connector/config.json:

    {
        "logLevel": 1,
        "logPath": "/var/log/oss-connector/connector.log",
        "auditPath": "/var/log/oss-connector/audit.log",
        "expireTimeSec": 120,
        "prefetch": {
            "vcpus": 16,
            "workers": 16
        }
    }
    

    Parameter

    Description

    logLevel

    Controls the detail level of log output.

    logPath

    Runtime log output path.

    auditPath

    Audit log path for security and compliance tracking.

    expireTimeSec

    Cache file release delay in seconds after all references are closed. Default: 120.

    prefetch.vcpus

    Number of vCPUs for prefetching. Default: 16.

    prefetch.workers

    Number of workers per vCPU for concurrency. Default: 16.

  • Configure environment variables

    Environment variable KEY

    Description

    OSS_ACCESS_KEY_ID

    AccessKey pair of an Alibaba Cloud account or RAM user.

    When using a temporary access token, set these to the temporary credential's AccessKey pair.

    OSS Connector requires the `oss:ListObjects` permission on the target bucket directory. For anonymous-access buckets, leave `OSS_ACCESS_KEY_ID` and `OSS_ACCESS_KEY_SECRET` unset or empty.

    OSS_ACCESS_KEY_SECRET

    OSS_SESSION_TOKEN

    Temporary access token. Required when using STS temporary credentials to access OSS.

    Set to an empty string when using permanent AccessKey credentials.

    OSS_ENDPOINT

    OSS endpoint. Example: http://oss-cn-beijing-internal.aliyuncs.com. Defaults to HTTPS if no protocol is specified. Use HTTP on internal networks for better performance.

    OSS_REGION

    OSS region ID. Example: cn-beijing. Authentication may fail if not specified.

    OSS_PATH

    OSS model path. Format: `oss://bucketname/path/`. Example: oss://examplebucket/qwen/Qwen3-8B/.

    MODEL_DIR

    Local model directory for the inference framework. Empty the directory before use. Temporary data downloaded during loading can be deleted afterward.

    Note
    • The `MODEL_DIR` path must match the inference framework's model path (`--model` for vllm, `--model-path` for sglang).

    • `MODEL_DIR` requires read/write permissions. Its directory structure mirrors `OSS_PATH`.

    • Model files are prefetched and cached in memory during loading. The cache releases after 120 seconds by default (configurable via `expireTimeSec`).

    • Use this directory exclusively for connector model loading.

    • Do not create this directory on an existing OSS mount (such as ossfs).

    LD_PRELOAD

    Path to the preloaded library: /usr/local/lib/libossc_preload.so. Set as a temporary environment variable. Example: LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ./myapp

    ENABLE_CONNECTOR

    OSS Connector process role. Set as a temporary environment variable.

    • `ENABLE_CONNECTOR=1`: Primary connector role.

    • `ENABLE_CONNECTOR=2`: Secondary connector role.

    Each instance allows only one primary connector process. Assign the primary role to the main process (entrypoint). All other connector processes must use the secondary role. the ray+vllm example for multi-node startup.

Start the model service

Single-node startup

vllm API Server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce

sglang API Server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 

Multi-node startup

ray+vllm

Common environment variables:

export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/models
Important

The `OSS_PATH` and `MODEL_DIR` variables must correspond. For example, if the model path on OSS is `oss://examplebucket/qwen/Qwen2___5-72B/`, the local model directory is `/tmp/models/qwen/Qwen2___5-72B/`.

Pod A starts the ray head:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --head --dashboard-host 0.0.0.0 --block

Pod B starts ray and joins the cluster:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --address='172.24.176.137:6379' --block     // 172.24.176.137 is the pod IP. Change this to the IP address of the head pod. The command to join the cluster is provided in the output after you run `ray start` on Pod A.

Start the vllm API Server:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=2 python3 -m vllm.entrypoints.openai.api_server --model ${MODEL_DIR}/qwen/Qwen2___5-72B/ --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.98 --tensor-parallel-size 32

sglang

Configure environment variables for the sglang process on each node.

Primary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 0 

Secondary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 1 

Kubernetes deployment

Build an image with the connector installed and push it to a repository. Example pod deployment YAML:

apiVersion: v1
kind: ConfigMap
metadata:
  name: connector-config
data:
  config.json: |
    {
        "logLevel": 1,
        "logPath": "/var/log/oss-connector/connector.log",
        "auditPath": "/var/log/oss-connector/audit.log",
        "expireTimeSec": 120,
        "prefetch": {
            "vcpus": 16,
            "workers": 16
        }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-connector-deployment
spec:
  selector:
    matchLabels:
      app: model-connector
  template:
    metadata:
      labels:
        app: model-connector
    spec:
      imagePullSecrets:
        - name: acr-credential-beijing
      hostNetwork: true
      containers:
      - name: container-name
        image: {IMAGE_ADDRESS}
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "24"
            memory: "700Gi"
          limits:
            cpu: "128"
            memory: "900Gi"
        command: 
          - bash
          - -c
          - ENABLE_CONNECTOR=1 python3 -m vllm.entrypoints.openai.api_server --model /var/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
        env:
        - name: LD_PRELOAD
          value: "/usr/local/lib/libossc_preload.so"
        - name: OSS_ENDPOINT
          value: "oss-cn-beijing-internal.aliyuncs.com"
        - name: OSS_REGION
          value: "cn-beijing"
        - name: OSS_PATH
          value: "oss://examplebucket/qwen/Qwen1.5-7B-Chat/"
        - name: MODEL_DIR
          value: "/var/model/"
        - name: OSS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: oss-access-key-connector
              key: key
        - name: OSS_ACCESS_KEY_SECRET
          valueFrom:
            secretKeyRef:
              name: oss-access-key-connector
              key: secret
        volumeMounts:
          - name: connector-config
            mountPath:  /etc/oss-connector/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: connector-config
        configMap:
          name: connector-config

Performance testing

Single-node model loading test

Test environment

Metric

Description

OSS

Beijing, internal network download bandwidth 250 Gbps

Test node

ecs.g7nex.32xlarge, network bandwidth 160 Gbps (80 Gbps × 2)

Statistical metrics

Metric

Description

Model download

Time to download model files via the connector.

End-to-end

Time for the vllm API server (CPU version) to start and become ready.

Test results

Model name

Model size (GB)

Model download time (seconds)

End-to-end time (seconds)

Qwen2.5-14B

27.522

1.7721

20.48

Qwen2.5-72B

135.437

10.57

30.09

Qwen3-8B

15.271

0.97

18.88

Qwen3-32B

61.039

3.99

22.97