OSS Connector for AI/ML accelerates model loading from OSS through `LD_PRELOAD`-based direct reads with prefetching and caching. No code changes required. Compatible with containers and mainstream inference frameworks.
High performance
OSS Connector for AI/ML significantly improves large model loading from OSS. With sufficient bandwidth, throughput can exceed 10 GB/s. Performance testing.
How it works
OSS Connector for AI/ML addresses performance bottlenecks when loading large models from OSS.
-
FUSE-based mount solutions often cannot fully utilize OSS bandwidth, causing slow model loading. OSS Connector intercepts I/O requests from the inference framework and converts them directly into HTTP(s) requests to OSS.
-
Using `LD_PRELOAD`, it prefetches and caches model data in memory — no code changes to your inference application required.
Deployment environment
-
Operating system: Linux x86-64
-
glibc: >=2.17
Install OSS Connector
-
Download the installation package.
-
oss-connector-lib-1.1.0rc7.x86_64.rpm: For Red Hat-based Linux distributions
https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.rpm -
oss-connector-lib-1.1.0rc7.x86_64.deb: For Debian-based Linux distributions
https://gosspublic.alicdn.com/oss-connector/oss-connector-lib-1.1.0rc7.x86_64.deb
-
-
Install OSS Connector.
Install using the downloaded .rpm or .deb package. The library `libossc_preload.so` is installed to
/usr/local/lib/.-
Install oss-connector-lib-1.1.0rc7.x86_64.rpm
yum install -y oss-connector-lib-1.1.0rc7.x86_64.rpm -
Install oss-connector-lib-1.1.0rc7.x86_64.deb
dpkg -i oss-connector-lib-1.1.0rc7.x86_64.deb
-
-
Verify that `/usr/local/lib/libossc_preload.so` exists and the version is correct.
nm -D /usr/local/lib/libossc_preload.so | grep version
Configure OSS Connector
-
Configuration file
The configuration file controls log output, cache policy, and prefetch concurrency.
Default configuration at
/etc/oss-connector/config.json:{ "logLevel": 1, "logPath": "/var/log/oss-connector/connector.log", "auditPath": "/var/log/oss-connector/audit.log", "expireTimeSec": 120, "prefetch": { "vcpus": 16, "workers": 16 } }Parameter
Description
logLevel
Controls the detail level of log output.
logPath
Runtime log output path.
auditPath
Audit log path for security and compliance tracking.
expireTimeSec
Cache file release delay in seconds after all references are closed. Default: 120.
prefetch.vcpus
Number of vCPUs for prefetching. Default: 16.
prefetch.workers
Number of workers per vCPU for concurrency. Default: 16.
-
Configure environment variables
Environment variable KEY
Description
OSS_ACCESS_KEY_ID
AccessKey pair of an Alibaba Cloud account or RAM user.
When using a temporary access token, set these to the temporary credential's AccessKey pair.
OSS Connector requires the `oss:ListObjects` permission on the target bucket directory. For anonymous-access buckets, leave `OSS_ACCESS_KEY_ID` and `OSS_ACCESS_KEY_SECRET` unset or empty.
OSS_ACCESS_KEY_SECRET
OSS_SESSION_TOKEN
Temporary access token. Required when using STS temporary credentials to access OSS.
Set to an empty string when using permanent AccessKey credentials.
OSS_ENDPOINT
OSS endpoint. Example:
http://oss-cn-beijing-internal.aliyuncs.com. Defaults to HTTPS if no protocol is specified. Use HTTP on internal networks for better performance.OSS_REGION
OSS region ID. Example: cn-beijing. Authentication may fail if not specified.
OSS_PATH
OSS model path. Format: `oss://bucketname/path/`. Example:
oss://examplebucket/qwen/Qwen3-8B/.MODEL_DIR
Local model directory for the inference framework. Empty the directory before use. Temporary data downloaded during loading can be deleted afterward.
Note-
The `MODEL_DIR` path must match the inference framework's model path (`--model` for vllm, `--model-path` for sglang).
-
`MODEL_DIR` requires read/write permissions. Its directory structure mirrors `OSS_PATH`.
-
Model files are prefetched and cached in memory during loading. The cache releases after 120 seconds by default (configurable via `expireTimeSec`).
-
Use this directory exclusively for connector model loading.
-
Do not create this directory on an existing OSS mount (such as ossfs).
LD_PRELOAD
Path to the preloaded library:
/usr/local/lib/libossc_preload.so. Set as a temporary environment variable. Example:LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ./myappENABLE_CONNECTOR
OSS Connector process role. Set as a temporary environment variable.
-
`ENABLE_CONNECTOR=1`: Primary connector role.
-
`ENABLE_CONNECTOR=2`: Secondary connector role.
Each instance allows only one primary connector process. Assign the primary role to the main process (entrypoint). All other connector processes must use the secondary role. the ray+vllm example for multi-node startup.
-
Start the model service
Single-node startup
vllm API Server
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m vllm.entrypoints.openai.api_server --model /tmp/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
sglang API Server
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000
Multi-node startup
ray+vllm
Common environment variables:
export OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID}
export OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET}
export OSS_ENDPOINT=${OSS_ENDPOINT}
export OSS_REGION=${OSS_REGION}
export OSS_PATH=oss://examplebucket/
export MODEL_DIR=/tmp/models
The `OSS_PATH` and `MODEL_DIR` variables must correspond. For example, if the model path on OSS is `oss://examplebucket/qwen/Qwen2___5-72B/`, the local model directory is `/tmp/models/qwen/Qwen2___5-72B/`.
Pod A starts the ray head:
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --head --dashboard-host 0.0.0.0 --block
Pod B starts ray and joins the cluster:
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=1 ray start --address='172.24.176.137:6379' --block // 172.24.176.137 is the pod IP. Change this to the IP address of the head pod. The command to join the cluster is provided in the output after you run `ray start` on Pod A.
Start the vllm API Server:
LD_PRELOAD=/usr/local/lib/libossc_preload.so ENABLE_CONNECTOR=2 python3 -m vllm.entrypoints.openai.api_server --model ${MODEL_DIR}/qwen/Qwen2___5-72B/ --trust-remote-code --served-model-name ds --max-model-len 2048 --gpu-memory-utilization 0.98 --tensor-parallel-size 32
sglang
Configure environment variables for the sglang process on each node.
Primary node startup:
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 0
Secondary node startup:
LD_PRELOAD=/usr/local/lib/libossc_preload.so \
ENABLE_CONNECTOR=1 OSS_ACCESS_KEY_ID=${OSS_ACCESS_KEY_ID} \
OSS_ACCESS_KEY_SECRET=${OSS_ACCESS_KEY_SECRET} \ OSS_ENDPOINT=${OSS_ENDPOINT} \
OSS_REGION=${OSS_REGION} \
OSS_PATH=${OSS_PATH} \
MODEL_DIR=/tmp/model \
python3 -m sglang.launch_server --model-path /tmp/model --port 8000 --dist-init-addr 192.168.1.1:20000 --nnodes 2 --node-rank 1
Kubernetes deployment
Build an image with the connector installed and push it to a repository. Example pod deployment YAML:
apiVersion: v1
kind: ConfigMap
metadata:
name: connector-config
data:
config.json: |
{
"logLevel": 1,
"logPath": "/var/log/oss-connector/connector.log",
"auditPath": "/var/log/oss-connector/audit.log",
"expireTimeSec": 120,
"prefetch": {
"vcpus": 16,
"workers": 16
}
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-connector-deployment
spec:
selector:
matchLabels:
app: model-connector
template:
metadata:
labels:
app: model-connector
spec:
imagePullSecrets:
- name: acr-credential-beijing
hostNetwork: true
containers:
- name: container-name
image: {IMAGE_ADDRESS}
imagePullPolicy: Always
resources:
requests:
cpu: "24"
memory: "700Gi"
limits:
cpu: "128"
memory: "900Gi"
command:
- bash
- -c
- ENABLE_CONNECTOR=1 python3 -m vllm.entrypoints.openai.api_server --model /var/model --trust-remote-code --tensor-parallel-size 1 --disable-custom-all-reduce
env:
- name: LD_PRELOAD
value: "/usr/local/lib/libossc_preload.so"
- name: OSS_ENDPOINT
value: "oss-cn-beijing-internal.aliyuncs.com"
- name: OSS_REGION
value: "cn-beijing"
- name: OSS_PATH
value: "oss://examplebucket/qwen/Qwen1.5-7B-Chat/"
- name: MODEL_DIR
value: "/var/model/"
- name: OSS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: oss-access-key-connector
key: key
- name: OSS_ACCESS_KEY_SECRET
valueFrom:
secretKeyRef:
name: oss-access-key-connector
key: secret
volumeMounts:
- name: connector-config
mountPath: /etc/oss-connector/
terminationGracePeriodSeconds: 10
volumes:
- name: connector-config
configMap:
name: connector-config
Performance testing
Single-node model loading test
Test environment
|
Metric |
Description |
|
OSS |
Beijing, internal network download bandwidth 250 Gbps |
|
Test node |
ecs.g7nex.32xlarge, network bandwidth 160 Gbps (80 Gbps × 2) |
Statistical metrics
|
Metric |
Description |
|
Model download |
Time to download model files via the connector. |
|
End-to-end |
Time for the vllm API server (CPU version) to start and become ready. |
Test results
|
Model name |
Model size (GB) |
Model download time (seconds) |
End-to-end time (seconds) |
|
Qwen2.5-14B |
27.522 |
1.7721 |
20.48 |
|
Qwen2.5-72B |
135.437 |
10.57 |
30.09 |
|
Qwen3-8B |
15.271 |
0.97 |
18.88 |
|
Qwen3-32B |
61.039 |
3.99 |
22.97 |