How to use the streaming data tunnel service-MaxCompute(MaxCompute)-阿里云帮助中心

MaxCompute Streaming Tunnel lets you write data to MaxCompute in streaming mode using a dedicated set of APIs and backend services. These APIs significantly reduce the development costs of distributed services and remove the performance bottlenecks of MaxCompute Tunnel in high-concurrency and high-QPS (queries per second) scenarios such as partition locking conflicts, small-file fragmentation, and complex synchronization code.

MaxCompute Streaming Tunnel has been in public preview since January 1, 2021, and is free of charge during the preview period. Follow Service notices to stay informed about future billing changes.

When to use Streaming Tunnel

MaxCompute Streaming Tunnel complements MaxCompute Tunnel rather than replacing it. Use this table to decide which channel fits your workload:

Dimension	MaxCompute Streaming Tunnel	MaxCompute Tunnel
Data form	Streaming rows	Batched files
Concurrency	High concurrency supported; no partition locking contention	Concurrent writes can cause partition locking conflicts
Write throughput	Optimized for high QPS; prevents small-file fragmentation	Small `batch size` at high QPS generates many small files
Incremental data	Asynchronously merged in the background without service interruption	No built-in async merge; data is written as-is
Partitioning	Automatic partitioning across concurrent jobs	Manual partition management required
Best for	Real-time log ingestion, stream processing results, message queue sync	Large-batch ETL, periodic bulk loads

Key capabilities

Streaming semantic APIs: Help facilitate the development of distributed data synchronization services, reducing development costs.
Automatic partitioning: Eliminates concurrent partition locking when multiple synchronization jobs write to the same table simultaneously.
Asynchronous data merging: Merges incremental data in the background without interrupting active write operations, improving storage efficiency and preventing small-file accumulation.

Use cases

Scenario	Description
Real-time event log ingestion	Write log data directly into MaxCompute for downstream batch processing—no intermediate storage service needed, which reduces pipeline costs.
Stream processing result storage	Persist Flink or other stream computing results into MaxCompute without concurrency or `batch size` limits, avoiding small-file accumulation from high-frequency writes. MaxCompute Streaming Tunnel ensures the availability of streaming services in scenarios that involve high-concurrency locking.
Message queue synchronization	Sync data from DataHub or ApsaraMQ for Kafka into MaxCompute at high concurrency and large batch volumes, replacing workarounds previously needed with the Simple Message Queue connector.

Integrate with upstream services

By default, Realtime Compute for Apache Flink, DataWorks, and ApsaraMQ for Kafka write to MaxCompute via MaxCompute Tunnel. To switch to Streaming Tunnel:

Service	How to enable Streaming Tunnel
Realtime Compute for Apache Flink	Use the built-in Streaming Tunnel plug-in provided by Realtime Compute for Apache Flink.
DataWorks	Contact the DataWorks engineer on duty to enable Streaming Tunnel in the background.
ApsaraMQ for Kafka	Contact the Kafka engineer on duty to enable Streaming Tunnel in the background.

Logstash log collector: Use Logstash (stream).

Limitations

Table or partition locking during writes

MaxCompute Tunnel Service locks the target table or partition for the duration of a streaming write. All DML operations that modify data—such as insert into and insert overwrite—are blocked until the write completes and the lock is released.

Schema modification not supported

If the schema of the target table is modified while Streaming Tunnel is active, streaming data cannot be written to the table.

Temporary storage overhead for hot data

When asynchronous data merging or ZORDER BY is enabled, Streaming Tunnel retains two copies of data written within the previous hour: the original ingested data and the asynchronously merged copy. This redundant storage is automatically cleaned up after the default retention period of 1 hour.

Plan storage capacity accordingly if your workload has a high ingestion rate during the merge window.