Sync Kafka data
Tablestore Sink Connector batch imports data from Apache Kafka to a data table or time series table in Tablestore.
Background information
Apache Kafka is a distributed Message Queuing (MSMQ) system. Kafka Connect allows data systems to import and export data streams through Apache Kafka.
Tablestore Sink Connector is built on Kafka Connect. It polls subscribed topics in Apache Kafka in poll mode, parses the message records, and batch imports the data to Tablestore. The connector optimizes the import process and supports custom configurations.
Tablestore is a multi-model data storage service developed by Alibaba Cloud. It stores large amounts of structured data and supports multiple data models, including the Wide Column model and the TimeSeries model. You can synchronize data from Apache Kafka to a data table (Wide Column model) or time series table (TimeSeries model) in Tablestore. For more information, see Sync Kafka data to a data table and Stream Kafka data to a time series table.
Features
Tablestore Sink Connector supports the following features:
-
At-least-once delivery
Ensures that message records are delivered from Kafka topics to Tablestore at least once.
-
Data mapping
Deserializes data in Kafka topics by using a converter. To use a converter, modify the key.converter and value.converter attributes in the worker or connector configurations of Kafka Connect. You can use the built-in JsonConverter, a third-party converter, or a custom converter.
-
Automatic creation of destination tables in Tablestore
If the destination table does not exist in Tablestore, the connector automatically creates one based on the primary key columns and attribute column whitelist that you specify. If no attribute column whitelist is specified, all fields in the record values of Kafka message records are used as the attribute columns.
-
Error handling policy
Errors may occur when message records are parsed or written to Tablestore during batch import. You can terminate the task, ignore the error, or log the message record and error details in Kafka or Tablestore.
Working mode
Tablestore Sink Connector supports standalone and distributed modes.
-
In standalone mode, all tasks run in a single process. This mode is easy to configure and suitable for learning about Tablestore Sink Connector.
-
In distributed mode, tasks run in parallel across multiple processes. This mode allocates tasks based on process workloads and provides fault tolerance, offering better stability than standalone mode. We recommend the distributed mode.