introduction to TiDB data sources-DataWorks(DataWorks)-阿里云帮助中心

Supported TiDB versions

Batch data read: TiDB 7.x and 8.x are supported.
Batch data write: Not supported.
Real-time data read and write: Not supported.

Note

TiDB is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. DataWorks uses the MySQL protocol for batch data synchronization from TiDB. For more information about the compatibility between TiDB and MySQL, see MySQL Compatibility.

Supported data types

For a complete list of TiDB data types, see Data Types. The following table lists the support for major data types.

Type	Batch data read (TiDB Reader)
TINYINT	Supported
SMALLINT	Supported
MEDIUMINT	Supported
INTEGER	Supported
BIGINT	Supported
FLOAT	Supported
DOUBLE	Supported
DECIMAL	Supported
CHAR	Supported
VARCHAR	Supported
JSON	Supported
TEXT	Supported
TINYTEXT	Supported
MEDIUMTEXT	Supported
LONGTEXT	Supported
VARBINARY	Supported
BINARY	Supported
BLOB	Supported
TINYBLOB	Supported
MEDIUMBLOB	Supported
LONGBLOB	Supported
ENUM	Supported
SET	Supported
BOOLEAN	Supported
BIT	Supported
DATE	Supported
DATETIME	Supported
TIMESTAMP	Supported
TIME	Supported
YEAR	Supported

Prepare the TiDB environment

Before synchronizing data in DataWorks, prepare your TiDB environment as described in this section to ensure that your data synchronization tasks run smoothly.

Confirm the TiDB version

Data Integration requires TiDB 7.x or 8.x. Confirm that your TiDB version meets this requirement. You can run the following statement in your TiDB database to check the current version.

SELECT TIDB_VERSION()\G

*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.1.1
Edition: Community
Git Commit Hash: 821e491a20fbab36604b36b647b5bae26a2c1418
Git Branch: HEAD
UTC Build Time: 2024-08-27 19:16:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.00 sec)

Configure account permissions

Create a dedicated TiDB account for DataWorks to access the data source by following these steps:

(Optional) Create an account. For more information, see TiDB User Account Management.
Configure permissions.

For batch data reads from TiDB, the account must have read (SELECT) permission on the source tables.

You can run the following commands to grant permissions to the account, or directly grant the SUPER permission. When you run the statements, replace 'sync_account' with your account name.
```
-- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password'; // Creates a synchronization account and sets a password. The account can log on from any host. '%' indicates any host.
GRANT SELECT ON *.* TO 'sync_account'@'%'; // Grants the SELECT permission on all databases to the synchronization account.
```
The *.* syntax grants the specified permissions on all tables in all databases. You can also grant permissions on specific tables in a target database. For example, to grant the SELECT permission on the user table in the test database, run the following statement: GRANT SELECT ON test.user TO 'sync_account'@'%';.

Note
The REPLICATION SLAVE statement grants a global permission and cannot be granted on a specific table in a database.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Appendix: Script example and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script example

The following code provides a configuration example for the reader:

Note

The comments in the following JSON example are for explanation purposes only. Remove them before running the task.

{
  "type": "job",
  "version": "2.0",
  "steps":
  [
    {
      "stepType": "tidb",
      "parameter":
      {
        "column":
        [
          "id",
          "name"
        ],
        "where": "",
        "splitPk": "id",
        "connection":
        [
          {
            "selectedDatabase": "test_database",
            "datasource": "test_datasource",
            "table":
            [
              "test_table"
            ]
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "odps",
      "parameter":
      {
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting":
  {
    "errorLimit":
    {
      "record": "0"
    },
    "speed":
    {
      "throttle": false,
      "concurrent": 3
    }
  },
  "order":
  {
    "hops":
    [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Reader script parameters

Parameter	Description	Required	Default
datasource	This must match the data source name you configured in DataWorks.	Yes	None
table	The name of the source table. The following examples show advanced usage for configuring a range for this parameter: You can configure a range to read data from sharded databases and tables. For example, `'table_[0-99]'` specifies tables from `'table_0'` to `'table_99'`. If your table names have fixed-length numeric suffixes, such as from `'table_000'` to `'table_999'`, you can set the parameter to `"table": ["table_00[0-9]", "table_0[10-99]", "table_[100-999]"]`.	Yes	None
column	The source table columns to synchronize, specified as a JSON array. To read all columns, use `["*"]`. Column pruning: You can select a subset of columns to read. Column reordering: You can export columns in an order that is different from the table schema. Constants: You must follow the MySQL SQL syntax. Example: ["id","`table`","1","'test_constant'","null","'null'","expr(a+1)","2.3","true"] . `id` is a regular column name. `table` is a column name that is also a reserved keyword. `1` is an integer constant. `'test_constant'` is a string constant. It must be enclosed in single quotation marks. Handling null values: `""` indicates an empty string. `null` indicates a NULL value. `'null'` indicates the string "null". `expr(a+1)` is an example of an expression. `2.3` is a floating-point number. `true` is a boolean value. You must explicitly specify the columns to synchronize in the `column` parameter. The parameter cannot be empty.	Yes	None
splitPk	Specifies a column to use for data sharding. This enables parallel tasks to run during data extraction, which improves synchronization efficiency. Use the table's primary key for the `splitPk` parameter. Primary key values are typically distributed evenly, which helps prevent data hot spots. Currently, `splitPk` supports data sharding for integer types only. It does not support string, floating-point, date, or other data types. If you specify an unsupported data type, this feature is ignored and synchronization runs on a single thread. If this parameter is not configured, data is synchronized on a single thread.	No	None
where	A filter condition for the source data. This is often used for incremental synchronization. For example, to synchronize only the current day's data, set the condition to `gmt_create>$bizdate`. The `WHERE` clause allows for effective incremental data synchronization. If you do not specify a `WHERE` clause, or if you provide one without a key or value, the operation is treated as a full data synchronization. The WHERE clause does not support the LIMIT syntax.	No	None