Feature overview

更新时间:
复制 MD 格式

DataHub is a real-time data streaming service for ingesting, caching, and routing high-throughput data to downstream analytics and storage systems.

Benefits

  • Stability: Built on Alibaba's internal real-time data transfer infrastructure, DataHub has proven its stability and reliability at scale — including supporting the annual Double 11 event.

  • High throughput: A single topic supports terabytes of data writes per day. Each shard handles hundreds of gigabytes per day.

  • Pay-as-you-go pricing: DataHub is available on demand — you pay only for what you use.

  • Ecosystem integration: Built on the Apsara distributed system, DataHub integrates with the Alibaba Cloud big data ecosystem, including MaxCompute, Realtime Compute for Apache Flink, and Hologres, to form a unified data architecture.

Features

  • Data ingestion: Ingest data into DataHub using SDKs, APIs, or third-party connectors such as Flume and Logstash.

  • Data shipping: The DataConnector module syncs ingested data in real time to downstream storage and analytics systems — including MaxCompute, Object Storage Service (OSS), and Tablestore — with minimal configuration.

  • Data caching: Configurable retention periods let multiple independent consumers replay and re-read the same stream. For example, one application can compute real-time aggregates while another archives the raw data, both reading from the same topic simultaneously. Automatic multi-copy replication ensures data reliability.

  • Multiple interfaces: Access DataHub through the web console for quick operations, or use APIs and SDKs for programmatic access.