OSS Connector AI ML model loading LD_PRELOAD acceleration-Object Storage Service(OSS)-阿里云帮助中心

High performance

OSS Connector for AI/ML significantly improves large model loading from OSS. With sufficient bandwidth, throughput can exceed 10 GB/s. Performance testing.

How it works

OSS Connector for AI/ML addresses performance bottlenecks when loading large models from OSS.

FUSE-based mount solutions often cannot fully utilize OSS bandwidth, causing slow model loading. OSS Connector intercepts I/O requests from the inference framework and converts them directly into HTTP(s) requests to OSS.
Using `LD_PRELOAD`, it prefetches and caches model data in memory — no code changes to your inference application required.

Deployment environment

Operating system: Linux x86-64
glibc: >=2.17

Install OSS Connector

Download the installation package.
- oss-connector-lib-1.1.0rc7.x86_64.rpm: For Red Hat-based Linux distributions
```
https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.rpm
```
- oss-connector-lib-1.1.0rc7.x86_64.deb: For Debian-based Linux distributions
```
https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.deb
```
Install OSS Connector.

Install using the downloaded .rpm or .deb package. The library `libossc_preload.so` is installed to /usr/local/lib/.
- Install oss-connector-lib-1.1.0rc7.x86_64.rpm
```
yum install -y oss-connector-lib-1.1.0rc7.x86_64.rpm
```
- Install oss-connector-lib-1.1.0rc7.x86_64.deb
```
dpkg -i oss-connector-lib-1.1.0rc7.x86_64.deb
```
Verify that `/usr/local/lib/libossc_preload.so` exists and the version is correct.
```
nm -D /usr/local/lib/libossc_preload.so | grep version
```

Configure OSS Connector

Configuration file

The configuration file controls log output, cache policy, and prefetch concurrency.

Default configuration at /etc/oss-connector/config.json:

{
    "logLevel": 1,
    "logPath": "/var/log/oss-connector/connector.log",
    "auditPath": "/var/log/oss-connector/audit.log",
    "expireTimeSec": 120,
    "prefetch": {
        "vcpus": 16,
        "workers": 16
    }
}

Parameter	Description
logLevel	Controls the detail level of log output.
logPath	Runtime log output path.
auditPath	Audit log path for security and compliance tracking.
expireTimeSec	Cache file release delay in seconds after all references are closed. Default: 120.
prefetch.vcpus	Number of vCPUs for prefetching. Default: 16.
prefetch.workers	Number of workers per vCPU for concurrency. Default: 16.

Configure environment variables

Environment variable KEY	Description
OSS_ACCESS_KEY_ID	AccessKey pair of an Alibaba Cloud account or RAM user. When using a temporary access token, set these to the temporary credential's AccessKey pair. OSS Connector requires the `oss:ListObjects` permission on the target bucket directory. For anonymous-access buckets, leave `OSS_ACCESS_KEY_ID` and `OSS_ACCESS_KEY_SECRET` unset or empty.
OSS_ACCESS_KEY_SECRET
OSS_SESSION_TOKEN	Temporary access token. Required when using STS temporary credentials to access OSS. Set to an empty string when using permanent AccessKey credentials.
OSS_ENDPOINT	OSS endpoint. Example: `http://oss-cn-beijing-internal.aliyuncs.com`. Defaults to HTTPS if no protocol is specified. Use HTTP on internal networks for better performance.
OSS_REGION	OSS region ID. Example: cn-beijing. Authentication may fail if not specified.
OSS_PATH	OSS model path. Format: `oss://bucketname/path/`. Example: `oss://examplebucket/qwen/Qwen3-8B/`.
MODEL_DIR	Local model directory for the inference framework. Empty the directory before use. Temporary data downloaded during loading can be deleted afterward. Note The `MODEL_DIR` path must match the inference framework's model path (`--model` for vllm, `--model-path` for sglang). `MODEL_DIR` requires read/write permissions. Its directory structure mirrors `OSS_PATH`. Model files are prefetched and cached in memory during loading. The cache releases after 120 seconds by default (configurable via `expireTimeSec`). Use this directory exclusively for connector model loading. Do not create this directory on an existing OSS mount (such as ossfs).
LD_PRELOAD	Path to the preloaded library: `/usr/local/lib/libossc_preload.so`. Set as a temporary environment variable. Example: `LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ./myapp`
ENABLE_CONNECTOR	OSS Connector process role. Set as a temporary environment variable. `ENABLE_CONNECTOR=1`: Primary connector role. `ENABLE_CONNECTOR=2`: Secondary connector role. Each instance allows only one primary connector process. Assign the primary role to the main process (entrypoint). All other connector processes must use the secondary role. the ray+vllm example for multi-node startup.

Start the model service

Single-node startup

vllm API Server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce

sglang API Server

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000

Multi-node startup

ray+vllm

Common environment variables:

export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/models

Important

The `OSS_PATH` and `MODEL_DIR` variables must correspond. For example, if the model path on OSS is `oss://examplebucket/qwen/Qwen2___5-72B/`, the local model directory is `/tmp/models/qwen/Qwen2___5-72B/`.

Pod A starts the ray head:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --head --dashboard-host 0.0.0.0 --block

Pod B starts ray and joins the cluster:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --address='172.24.176.137:6379' --block     // 172.24.176.137 is the pod IP. Change this to the IP address of the head pod. The command to join the cluster is provided in the output after you run `ray start` on Pod A.

Start the vllm API Server:

LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=2 python3 -m vllm.entrypoints.openai.api_server --model ${MODEL_DIR}/qwen/Qwen2___5-72B/ --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.98 --tensor-parallel-size 32

sglang

Configure environment variables for the sglang process on each node.

Primary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 0

Secondary node startup:

LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 1

Kubernetes deployment

Build an image with the connector installed and push it to a repository. Example pod deployment YAML:

apiVersion: v1
kind: ConfigMap
metadata:
  name: connector-config
data:
  config.json: |
    {
        "logLevel": 1,
        "logPath": "/var/log/oss-connector/connector.log",
        "auditPath": "/var/log/oss-connector/audit.log",
        "expireTimeSec": 120,
        "prefetch": {
            "vcpus": 16,
            "workers": 16
        }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-connector-deployment
spec:
  selector:
    matchLabels:
      app: model-connector
  template:
    metadata:
      labels:
        app: model-connector
    spec:
      imagePullSecrets:
        - name: acr-credential-beijing
      hostNetwork: true
      containers:
      - name: container-name
        image: {IMAGE_ADDRESS}
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "24"
            memory: "700Gi"
          limits:
            cpu: "128"
            memory: "900Gi"
        command: 
          - bash
          - -c
          - ENABLE_CONNECTOR=1 python3 -m vllm.entrypoints.openai.api_server --model /var/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
        env:
        - name: LD_PRELOAD
          value: "/usr/local/lib/libossc_preload.so"
        - name: OSS_ENDPOINT
          value: "oss-cn-beijing-internal.aliyuncs.com"
        - name: OSS_REGION
          value: "cn-beijing"
        - name: OSS_PATH
          value: "oss://examplebucket/qwen/Qwen1.5-7B-Chat/"
        - name: MODEL_DIR
          value: "/var/model/"
        - name: OSS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: oss-access-key-connector
              key: key
        - name: OSS_ACCESS_KEY_SECRET
          valueFrom:
            secretKeyRef:
              name: oss-access-key-connector
              key: secret
        volumeMounts:
          - name: connector-config
            mountPath:  /etc/oss-connector/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: connector-config
        configMap:
          name: connector-config

Performance testing

Single-node model loading test

Test environment

Metric	Description
OSS	Beijing, internal network download bandwidth 250 Gbps
Test node	ecs.g7nex.32xlarge, network bandwidth 160 Gbps (80 Gbps × 2)

Statistical metrics

Metric	Description
Model download	Time to download model files via the connector.
End-to-end	Time for the vllm API server (CPU version) to start and become ready.

Test results

Model name	Model size (GB)	Model download time (seconds)	End-to-end time (seconds)
Qwen2.5-14B	27.522	1.7721	20.48
Qwen2.5-72B	135.437	10.57	30.09
Qwen3-8B	15.271	0.97	18.88
Qwen3-32B	61.039	3.99	22.97