OSS accelerator caches frequently accessed objects on high-performance NVMe SSDs in the same Availability Zone as your compute resources, delivering millisecond-level latency and high QPS for AI, data warehousing, and big data analytics workloads.
Buckets without region attributes do not support OSS accelerator.
Benefits
-
Low latency
NVMe SSDs deliver millisecond-level download latency, improving performance for data warehouse queries and model inference downloads.
-
High IOPS
The accelerator provides high throughput for small data volumes and handles burst read demands for hot data.
-
Increased throughput
Bandwidth scales linearly with cache capacity, providing burst throughput up to hundreds of Gbit/s.
-
Automatic scaling
Scale cache capacity from 50 GB to 100 TB without service interruption to match periodic workload demands. The accelerator leverages OSS massive storage and can cache multiple data warehouse tables or partitions directly.
-
Decoupled storage and computing
The accelerator operates independently of compute servers, allowing you to adjust cache capacity and performance online without service interruption.
-
Data consistency
Unlike conventional caches, OSS accelerator ensures data consistency. When objects in OSS are updated, the accelerator automatically identifies and caches the latest versions so compute engines always read current data.
-
Multiple warmup policies
The accelerator automatically detects updated objects and provides the following warmup policies:
-
Warmup during read: On a cache miss, the accelerator retrieves the data from the source bucket and caches it automatically.
-
Synchronous warmup: Data is cached on the accelerator at write time.
-
Asynchronous warmup: Batch-cache data from OSS to the accelerator on a configured schedule.
Note-
Warmup during read is always enabled and cannot be disabled.
-
Synchronous warmup and asynchronous warmup must be manually enabled. Both can be enabled simultaneously.
-
-
How it works
Each accelerator has an internal accelerated endpoint dedicated to its region. Access is limited to the internal network. For example, the endpoint for the China (Beijing) region is http://cn-beijing-internal.oss-data-acc.aliyuncs.com. Clients in the same VPC can use this endpoint to access cached resources.
-
Write requests
-
Warmup during read: Write requests sent to the accelerated endpoint are forwarded to OSS buckets, similar to using standard OSS endpoints.
-
Synchronous warmup: Write requests are forwarded to both OSS buckets and the accelerator.
-
Asynchronous warmup: Hot data is pre-loaded to the accelerator before read requests arrive.
-
Synchronous warmup + asynchronous warmup: Writes are forwarded to both OSS and the accelerator, and hot data is also pre-loaded before reads.
-
-
Read requests
NoteRead requests follow the same path regardless of the warmup policy.
-
Read requests sent to the accelerated endpoint are forwarded to the accelerator.
-
When the accelerator receives the read requests, the accelerator searches for the requested objects in the cache.
-
Cache hit: Objects are returned directly to the client.
-
Cache miss: The accelerator retrieves the objects from the mapped OSS bucket, caches them, and returns them to the client.
-
Cache full: The accelerator evicts less frequently accessed objects to prioritize hot data.
-
-
Scenarios
OSS accelerator suits scenarios that require high bandwidth and repeated data reads:
Low-latency data sharing
-
Background information
A customer scans goods in a vending machine, takes a picture, and uploads it via a mobile app. The backend stores the picture, then subsystems perform content moderation and barcode recognition. Results must be returned within milliseconds for fee deduction.
-
Solution
Use synchronous warmup to reduce image loading latency and shorten the transaction chain for latency-sensitive, repeated-read workloads.
Model inference
-
Background information
AI inference servers pull and load model objects for AIGC tasks. During debugging, servers frequently switch between models. As model sizes grow, pull and load times increase significantly.
-
Solution
Use asynchronous warmup when you can pre-determine hot model objects, or warmup during read when you cannot. With asynchronous warmup, use the accelerator SDK to pre-load known models. The accelerator automatically caches models on NVMe media for faster subsequent reads. Scale cache capacity at any time. If your inference server accesses OSS via a local directory, deploy ossfs.
Big data analysis
-
Background information
Business data is partitioned by day and stored in OSS. Analysts use engines like Hive or Spark for ad-hoc queries without knowing the exact data range in advance, and need faster query turnaround.
-
Solution
Use warmup during read for offline analytics with uncertain query ranges. Once Analyst A's data is cached, Analyst B's overlapping queries are automatically accelerated.
Multi-level acceleration
-
Background information
Client-side caching and server-side acceleration work together without conflict for multi-level acceleration.
-
Solution
Deploy Alluxio alongside compute clusters as a client-side cache. On Alluxio cache misses, reads fall through to the OSS accelerator (warmup during read). Alluxio evicts data based on TTL due to limited client host capacity. Since the OSS accelerator retains data (up to hundreds of TB) beyond Alluxio's TTL, subsequent Alluxio misses load directly from the accelerator, achieving two-level acceleration.
Metrics
|
Metric |
Description |
|
Cache capacity |
If your business requires a greater cache capacity, submit a ticket. |
|
Accelerator throughput |
Throughput scales with cache capacity: up to 2.4 Gbit/s per TB of cache. This throughput is independent of standard OSS bandwidth limits. For more information about the standard bandwidth limits of OSS, see Limits and performance metrics. For example, if OSS provides 100 Gbit/s standard bandwidth in China (Shenzhen) and you create a 10 TB accelerator, you gain an additional 24 Gbit/s of low-latency throughput via the accelerated endpoint. Use the standard OSS internal endpoint for batch offline computing (100 Gbit/s concurrent block reads) and the accelerated endpoint for hot data queries (additional 24 Gbit/s low-latency access from NVMe SSDs). |
|
Peak read bandwidth |
Formula: MAX[600,300 × Cache capacity (TB)] MB/s
Example: A 2 TB (2,048 GB) accelerator provides 600 MB/s read bandwidth. |
|
Maximum read bandwidth |
40 GB/s If your business requires a greater read bandwidth, submit a ticket. |
|
Minimum latency for reading 128 KB of data in a single request |
<10 ms |
|
Scale-up or scale-down interval |
Once per hour |
|
Scale-up or scale-down method |
Manually scale up or scale down in the OSS console |
|
Cache deletion policy |
Uses the Least Recently Used (LRU) algorithm: frequently accessed data is retained, and stale data is evicted first to maximize cache utilization. |
Billing rules
-
OSS accelerator is in public preview. During the public preview, up to 100 GB of cache capacity is free. After public preview ends, you are charged based on the actual cache capacity using pay-as-you-go billing.
-
When you use the accelerated endpoint to read or write data, OSS API calling fees apply even if no origin fetch requests are sent.
|
Billable item |
Billing rule |
Billing cycle |
Billing method |
|
OSS accelerator (AcceleratorCacheSize) |
You are charged based on the provisioned capacity of the OSS accelerator and the duration of usage. Important
When using this feature, you are charged based on the capacity of the accelerator that you request, regardless of the actual volume of data stored. For example, if you provision 100 GB of accelerator capacity and prefetch 50 GB of data, you are still charged for the full 100 GB provisioned capacity. |
Bills are generated on an hourly basis (generally after a billing cycle ends). The time when bills are generated is determined by the system. |
|
For more information about how to query OSS billing data generated on an hourly basis, see Query hourly data of OSS and Query bills.
Next steps
-
For more information about how to create an accelerator and modify the cache capacity of an accelerator, see Create, modify, and delete accelerators.
-
For more information about how to configure and use the OSS accelerator feature together with OSS tools and OSS SDKs, see Use OSS accelerator.
-
For more information about the differences in performance when you access resources by using an OSS internal endpoint and an accelerator in specific business scenarios, see Performance metrics.