You can use the TiDB data source in DataWorks for batch data reads. This topic describes the supported data synchronization capabilities for TiDB.
Supported TiDB versions
-
Batch data read: TiDB 7.x and 8.x are supported.
-
Batch data write: Not supported.
-
Real-time data read and write: Not supported.
TiDB is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. DataWorks uses the MySQL protocol for batch data synchronization from TiDB. For more information about the compatibility between TiDB and MySQL, see MySQL Compatibility.
Supported data types
For a complete list of TiDB data types, see Data Types. The following table lists the support for major data types.
|
Type |
Batch data read (TiDB Reader) |
|
TINYINT |
Supported |
|
SMALLINT |
Supported |
|
MEDIUMINT |
Supported |
|
INTEGER |
Supported |
|
BIGINT |
Supported |
|
FLOAT |
Supported |
|
DOUBLE |
Supported |
|
DECIMAL |
Supported |
|
CHAR |
Supported |
|
VARCHAR |
Supported |
|
JSON |
Supported |
|
TEXT |
Supported |
|
TINYTEXT |
Supported |
|
MEDIUMTEXT |
Supported |
|
LONGTEXT |
Supported |
|
VARBINARY |
Supported |
|
BINARY |
Supported |
|
BLOB |
Supported |
|
TINYBLOB |
Supported |
|
MEDIUMBLOB |
Supported |
|
LONGBLOB |
Supported |
|
ENUM |
Supported |
|
SET |
Supported |
|
BOOLEAN |
Supported |
|
BIT |
Supported |
|
DATE |
Supported |
|
DATETIME |
Supported |
|
TIMESTAMP |
Supported |
|
TIME |
Supported |
|
YEAR |
Supported |
Prepare the TiDB environment
Before synchronizing data in DataWorks, prepare your TiDB environment as described in this section to ensure that your data synchronization tasks run smoothly.
Confirm the TiDB version
Data Integration requires TiDB 7.x or 8.x. Confirm that your TiDB version meets this requirement. You can run the following statement in your TiDB database to check the current version.
SELECT TIDB_VERSION()\G
*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.1.1
Edition: Community
Git Commit Hash: 821e491a20fbab36604b36b647b5bae26a2c1418
Git Branch: HEAD
UTC Build Time: 2024-08-27 19:16:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.00 sec)
Configure account permissions
Create a dedicated TiDB account for DataWorks to access the data source by following these steps:
-
(Optional) Create an account. For more information, see TiDB User Account Management.
-
Configure permissions.
For batch data reads from TiDB, the account must have read (
SELECT) permission on the source tables.You can run the following commands to grant permissions to the account, or directly grant the
SUPERpermission. When you run the statements, replace'sync_account'with your account name.-- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password'; // Creates a synchronization account and sets a password. The account can log on from any host. '%' indicates any host. GRANT SELECT ON *.* TO 'sync_account'@'%'; // Grants the SELECT permission on all databases to the synchronization account.The
*.*syntax grants the specified permissions on all tables in all databases. You can also grant permissions on specific tables in a target database. For example, to grant the SELECT permission on the user table in the test database, run the following statement:GRANT SELECT ON test.user TO 'sync_account'@'%';.NoteThe
REPLICATION SLAVEstatement grants a global permission and cannot be granted on a specific table in a database.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Appendix: Script example and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script example
The following code provides a configuration example for the reader:
The comments in the following JSON example are for explanation purposes only. Remove them before running the task.
{
"type": "job",
"version": "2.0",
"steps":
[
{
"stepType": "tidb",
"parameter":
{
"column":
[
"id",
"name"
],
"where": "",
"splitPk": "id",
"connection":
[
{
"selectedDatabase": "test_database",
"datasource": "test_datasource",
"table":
[
"test_table"
]
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "odps",
"parameter":
{
},
"name": "Writer",
"category": "writer"
}
],
"setting":
{
"errorLimit":
{
"record": "0"
},
"speed":
{
"throttle": false,
"concurrent": 3
}
},
"order":
{
"hops":
[
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader script parameters
|
Parameter |
Description |
Required |
Default |
|
datasource |
This must match the data source name you configured in DataWorks. |
Yes |
None |
|
table |
The name of the source table. The following examples show advanced usage for configuring a range for this parameter:
|
Yes |
None |
|
column |
The source table columns to synchronize, specified as a JSON array. To read all columns, use
|
Yes |
None |
|
splitPk |
Specifies a column to use for data sharding. This enables parallel tasks to run during data extraction, which improves synchronization efficiency.
|
No |
None |
|
where |
A filter condition for the source data. This is often used for incremental synchronization. For example, to synchronize only the current day's data, set the condition to
|
No |
None |