Tablestore-Realtime Compute for Apache Flink(Flink)-阿里云帮助中心

[INFO] Doc info: docId=7705885, topicId=2301579, spaceId=133 [INFO] Document content read: nodeId=4060953, 43256 chars Tablestore

The Tablestore connector lets you use Tablestore tables as source tables, dimension tables, and sink tables in Flink SQL jobs running in streaming mode.

Connector capabilities

Item	Description
Running mode	Streaming mode
API type	SQL API
Table type	Source table, dimension table, and sink table
Data format	N/A
Sink table metrics	`numBytesOut`, `numBytesOutPerSecond`, `numRecordsOut`, `numRecordsOutPerSecond`, `currentSendTime`
Data update or deletion in a sink table	Supported

For details on sink metrics, see Monitoring metrics.

Prerequisites

Before you begin, ensure that you have:

A Tablestore instance and a Tablestore table. See Use Tablestore.

Usage limits

Cross-account access to Tablestore instances is supported. When using a VPC endpoint, the Tablestore instance must be in the same region as Flink. Set the accessId and accessKey to the AccessKey pair of the account that owns the Tablestore instance.

Syntax

All three table types use 'connector'='ots' in the WITH clause, with type-specific options.

Sink table

CREATE TABLE ots_sink (
  name VARCHAR,
  age BIGINT,
  birthday BIGINT,
  PRIMARY KEY (name, age) NOT ENFORCED
) WITH (
  'connector'='ots',
  'instanceName'='<yourInstanceName>',
  'tableName'='<yourTableName>',
  'accessId'='${ak_id}',
  'accessKey'='${ak_secret}',
  'endPoint'='<yourEndpoint>',
  'valueColumns'='birthday'
);

A Tablestore sink table requires a primary key. Each output record is appended to the table to update existing data.

Dimension table

CREATE TABLE ots_dim (
  id INT,
  len INT,
  content STRING
) WITH (
  'connector'='ots',
  'endPoint'='<yourEndpoint>',
  'instanceName'='<yourInstanceName>',
  'tableName'='<yourTableName>',
  'accessId'='${ak_id}',
  'accessKey'='${ak_secret}'
);

Source table

CREATE TABLE tablestore_stream (
  `order` VARCHAR,
  orderid VARCHAR,
  customerid VARCHAR,
  customername VARCHAR
) WITH (
  'connector'='ots',
  'endPoint'='<yourEndpoint>',
  'instanceName'='flink-source',
  'tableName'='flink_source_table',
  'tunnelName'='flinksourcestream',
  'accessId'='${ak_id}',
  'accessKey'='${ak_secret}',
  'ignoreDelete'='false'
);

Available metadata

The Tablestore source table exposes two metadata fields via the METADATA keyword. Use these fields to track the operation type and timing of each change event.

Metadata key	Flink data type	Description
`type`	STRING	The data operation type (maps to `OtsRecordType`).
`timestamp`	BIGINT	The data operation time in microseconds (maps to `OtsRecordTimestamp`). Set to `0` for full data reads.

To read metadata fields, declare them with the METADATA FROM syntax:

CREATE TABLE tablestore_stream (
  `order` VARCHAR,
  orderid VARCHAR,
  customerid VARCHAR,
  customername VARCHAR,
  record_type STRING METADATA FROM 'type',
  record_timestamp BIGINT METADATA FROM 'timestamp'
) WITH (
  ...
);

Connector options

General options

All table types share the following options.

Option	Type	Required	Default	Description
`connector`	String	Yes	—	Set to `ots`.
`instanceName`	String	Yes	—	Name of the Tablestore instance.
`endPoint`	String	Yes	—	Endpoint of the Tablestore instance. See Endpoints.
`tableName`	String	Yes	—	Name of the table.
`accessId`	String	Yes	—	AccessKey ID of your Alibaba Cloud account or a Resource Access Management (RAM) user. See How do I view the AccessKey ID and AccessKey secret?
`accessKey`	String	Yes	—	AccessKey secret of your Alibaba Cloud account or a RAM user.
`connectTimeout`	Integer	No	30000	Connection timeout in milliseconds.
`socketTimeout`	Integer	No	30000	Socket timeout in milliseconds.
`ioThreadCount`	Integer	No	4	Number of I/O threads.
`callbackThreadPoolSize`	Integer	No	4	Size of the callback thread pool.

Important

Use variables to store your AccessKey pair instead of hardcoding it.

Source table options

Option	Type	Required	Default	Description
`tunnelName`	String	Yes	—	Name of the Tablestore tunnel. Create the tunnel in the Tablestore console before using this option. Supported tunnel types: Incremental, Full, and Differential. See the "Create a tunnel" section in Quick start.
`ignoreDelete`	Boolean	No	false	Whether to skip delete operations. `true`: skip; `false`: process delete operations.
`skipInvalidData`	Boolean	No	false	Whether to skip dirty data. `true`: skip dirty data; `false`: report an error. Requires Ververica Runtime (VVR) 8.0.4 or later.
`retryStrategy`	Enum	No	TIME	Retry policy. `TIME`: retry until `retryTimeoutMs` elapses; `COUNT`: retry until `retryCount` is reached.
`retryCount`	Integer	No	3	Maximum number of retries. Applies when `retryStrategy` is `COUNT`.
`retryTimeoutMs`	Integer	No	180000	Retry timeout in milliseconds. Applies when `retryStrategy` is `TIME`.
`streamOriginColumnMapping`	String	No	—	Mapping from original column names to actual column names. Format: `origin_col1:col1,origin_col2:col2`.
`outputSpecificRowType`	Boolean	No	false	Whether to pass through the specific row type. `false`: all rows are treated as INSERT; `true`: rows can be INSERT, DELETE, or UPDATE_AFTER.
`dataFetchTimeoutMs`	Integer	No	10000	Maximum time in milliseconds to fetch data from a single partition. Reduce this value to lower overall synchronization latency when syncing many partitions. Requires VVR 8.0.10 or later.
`enableRequestCompression`	Boolean	No	false	Whether to enable request compression. Reduces bandwidth usage at the cost of higher CPU load. Requires VVR 8.0.10 or later.

Sink table options

Option	Type	Required	Default	Description
`valueColumns`	String	Yes	—	Names of the columns to write. Separate multiple column names with commas (,).
`retryIntervalMs`	Integer	No	1000	Retry interval in milliseconds.
`maxRetryTimes`	Integer	No	10	Maximum number of retries.
`bufferSize`	Integer	No	5000	Maximum number of records buffered before a write is triggered.
`batchWriteTimeoutMs`	Integer	No	5000	Write timeout in milliseconds. If buffered records don't reach `bufferSize` within this period, all buffered records are written.
`batchSize`	Integer	No	100	Number of records written per batch. Maximum: 200.
`ignoreDelete`	Boolean	No	false	Whether to skip delete operations.
`autoIncrementKey`	String	No	—	Name of the auto-increment primary key column. Configure only if the sink table has an auto-increment primary key column. Requires VVR 8.0.4 or later.
`overwriteMode`	Enum	No	PUT	Write mode. `PUT`: overwrite in PUT mode; `UPDATE`: overwrite in UPDATE mode. Dynamic column mode requires `UPDATE`.
`defaultTimestampInMillisecond`	Long	No	-1	Default timestamp for writes. If not set, the current system time is used.
`dynamicColumnSink`	Boolean	No	false	Whether to enable dynamic column mode. In this mode, no columns are pre-defined; columns are inserted based on runtime values. The first N columns define the primary key. The second-to-last column holds the column name and the last column holds its value — both must be STRING. If enabled, `overwriteMode` must be `UPDATE` and auto-increment primary keys are not supported.
`checkSinkTableMeta`	Boolean	No	true	Whether to verify that the Tablestore table's primary key matches the primary key declared in the CREATE TABLE statement.
`enableRequestCompression`	Boolean	No	false	Whether to enable request compression during writes.
`maxColumnsCount`	Integer	No	128	Maximum number of columns written to the sink table. If set above 128, the error `The count of attribute columns exceeds the maximum` occurs. Requires VVR 8.0.10 or later.
`storageType`	String	No	`WIDE_COLUMN`	Sink table type. `WIDE_COLUMN`: wide-column table; `TIMESERIES`: time series table.

Dimension table options

How the cache works

The dimension table cache reduces repeated lookups against Tablestore. Choose a cache policy based on your table size and query patterns:

None: No caching. Every lookup hits Tablestore directly. Use when data changes frequently and freshness is critical.
LRU: Caches a fixed number of recently accessed records. When a lookup misses the cache, the connector queries Tablestore and updates the cache with the result. Set cacheSize and cacheTTLMs when using this policy.
ALL (default): Loads the entire dimension table into the cache before the job starts. All subsequent lookups are served from cache. When the cache expires (cacheTTLMs), the connector reloads all data. Use ALL when the table is small and you expect many missing-key lookups. When using ALL, increase the memory of the join node — the cache requires approximately twice the size of the remote table.

Option	Type	Required	Default	Description
`retryIntervalMs`	Integer	No	1000	Retry interval in milliseconds.
`maxRetryTimes`	Integer	No	10	Maximum number of retries.
`cache`	String	No	ALL	Cache policy: `None`, `LRU`, or `ALL`.
`cacheSize`	Integer	No	—	Maximum number of cached records. Applies when `cache` is `LRU`.
`cacheTTLMs`	Integer	No	—	Cache TTL in milliseconds. For LRU: timeout per entry. For ALL: full-cache refresh interval. Leave unset to disable expiration.
`cacheEmpty`	Boolean	No	—	Whether to cache empty (no-match) results. `true`: cache; `false`: do not cache.
`cacheReloadTimeBlackList`	String	No	—	Time windows during which the ALL cache is not refreshed. Format: `2017-10-24 14:00 -> 2017-10-24 15:00, 2017-11-10 23:30 -> 2017-11-11 08:00`. Separate multiple windows with commas; use `->` between start and end times.
`async`	Boolean	No	false	Whether to enable asynchronous lookup. `true`: async lookups (results are not ordered); `false`: synchronous lookups.

Data type mappings