Real-time data consumption

更新时间:
复制 MD 格式

The real-time data consumption feature has two components: a high-speed data import API and real-time data source consumption. This feature lets you write data directly to segments (shards), which avoids the high write load on the master node caused by COPY or INSERT statements. You can also use a single AnalyticDB for PostgreSQL instance to directly consume data from Kafka.

Features

High-speed data import API

gRPC is a high-performance, open-source, and general-purpose Remote Procedure Call (RPC) framework developed by Google. The high-speed data import API is a client API that uses the gRPC protocol and segment-direct-write technology for efficient data transfer. This API provides flexible control over data write and update processes. For more information, see High-speed data import API.

Real-time data source consumption

Kafka data integration is the other component of the real-time data consumption feature. It allows an AnalyticDB for PostgreSQL database to consume data from Kafka topics in real time. This provides a powerful tool for data analytics that combines stream processing and batch processing. For more information, see Consume data from Kafka in real time.

Scenarios

  • Build real-time analytics reports and other real-time analytics applications.

  • Build a unified real-time data warehouse for both stream and batch processing using incremental real-time materialized views.

  • Use the high-speed data import API to achieve higher UPSERT throughput when SQL UPSERT or UPDATE statements cause poor write performance.

  • Import data from Kafka topics into AnalyticDB for PostgreSQL without using other data integration tools.

Benefits

  • Direct consumption of Kafka data reduces dependencies on other real-time processing components.

  • Writing data directly through compute nodes and bypassing client nodes significantly increases write throughput.

  • Provides higher performance than writing data with SQL (UPSERT or UPDATE) statements.

  • The write process generates minimal load on the master node.