TiDB data source

更新时间:
复制 MD 格式

You can use the TiDB data source in DataWorks for batch data reads. This topic describes the supported data synchronization capabilities for TiDB.

Supported TiDB versions

  • Batch data read: TiDB 7.x and 8.x are supported.

  • Batch data write: Not supported.

  • Real-time data read and write: Not supported.

Note

TiDB is highly compatible with the MySQL protocol and the common features and syntax of MySQL 5.7 and MySQL 8.0. DataWorks uses the MySQL protocol for batch data synchronization from TiDB. For more information about the compatibility between TiDB and MySQL, see MySQL Compatibility.

Supported data types

For a complete list of TiDB data types, see Data Types. The following table lists the support for major data types.

Type

Batch data read (TiDB Reader)

TINYINT

Supported

SMALLINT

Supported

MEDIUMINT

Supported

INTEGER

Supported

BIGINT

Supported

FLOAT

Supported

DOUBLE

Supported

DECIMAL

Supported

CHAR

Supported

VARCHAR

Supported

JSON

Supported

TEXT

Supported

TINYTEXT

Supported

MEDIUMTEXT

Supported

LONGTEXT

Supported

VARBINARY

Supported

BINARY

Supported

BLOB

Supported

TINYBLOB

Supported

MEDIUMBLOB

Supported

LONGBLOB

Supported

ENUM

Supported

SET

Supported

BOOLEAN

Supported

BIT

Supported

DATE

Supported

DATETIME

Supported

TIMESTAMP

Supported

TIME

Supported

YEAR

Supported

Prepare the TiDB environment

Before synchronizing data in DataWorks, prepare your TiDB environment as described in this section to ensure that your data synchronization tasks run smoothly.

Confirm the TiDB version

Data Integration requires TiDB 7.x or 8.x. Confirm that your TiDB version meets this requirement. You can run the following statement in your TiDB database to check the current version.

SELECT TIDB_VERSION()\G

*************************** 1. row ***************************
TIDB_VERSION(): Release Version: v8.1.1
Edition: Community
Git Commit Hash: 821e491a20fbab36604b36b647b5bae26a2c1418
Git Branch: HEAD
UTC Build Time: 2024-08-27 19:16:25
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.00 sec)

Configure account permissions

Create a dedicated TiDB account for DataWorks to access the data source by following these steps:

  1. (Optional) Create an account. For more information, see TiDB User Account Management.

  2. Configure permissions.

    For batch data reads from TiDB, the account must have read (SELECT) permission on the source tables.

    You can run the following commands to grant permissions to the account, or directly grant the SUPER permission. When you run the statements, replace 'sync_account' with your account name.

    -- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password'; // Creates a synchronization account and sets a password. The account can log on from any host. '%' indicates any host.
    GRANT SELECT ON *.* TO 'sync_account'@'%'; // Grants the SELECT permission on all databases to the synchronization account.

    The *.* syntax grants the specified permissions on all tables in all databases. You can also grant permissions on specific tables in a target database. For example, to grant the SELECT permission on the user table in the test database, run the following statement: GRANT SELECT ON test.user TO 'sync_account'@'%';.

    Note

    The REPLICATION SLAVE statement grants a global permission and cannot be granted on a specific table in a database.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Appendix: Script example and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script example

The following code provides a configuration example for the reader:

Note

The comments in the following JSON example are for explanation purposes only. Remove them before running the task.

{
  "type": "job",
  "version": "2.0",
  "steps":
  [
    {
      "stepType": "tidb",
      "parameter":
      {
        "column":
        [
          "id",
          "name"
        ],
        "where": "",
        "splitPk": "id",
        "connection":
        [
          {
            "selectedDatabase": "test_database",
            "datasource": "test_datasource",
            "table":
            [
              "test_table"
            ]
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "odps",
      "parameter":
      {
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting":
  {
    "errorLimit":
    {
      "record": "0"
    },
    "speed":
    {
      "throttle": false,
      "concurrent": 3
    }
  },
  "order":
  {
    "hops":
    [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Reader script parameters

Parameter

Description

Required

Default

datasource

This must match the data source name you configured in DataWorks.

Yes

None

table

The name of the source table.

The following examples show advanced usage for configuring a range for this parameter:

  • You can configure a range to read data from sharded databases and tables. For example, 'table_[0-99]' specifies tables from 'table_0' to 'table_99'.

  • If your table names have fixed-length numeric suffixes, such as from 'table_000' to 'table_999', you can set the parameter to "table": ["table_00[0-9]", "table_0[10-99]", "table_[100-999]"].

Yes

None

column

The source table columns to synchronize, specified as a JSON array. To read all columns, use ["*"].

  • Column pruning: You can select a subset of columns to read.

  • Column reordering: You can export columns in an order that is different from the table schema.

  • Constants: You must follow the MySQL SQL syntax. Example: ["id","`table`","1","'test_constant'","null","'null'","expr(a+1)","2.3","true"] .

    • id is a regular column name.

    • `table` is a column name that is also a reserved keyword.

    • 1 is an integer constant.

    • 'test_constant' is a string constant. It must be enclosed in single quotation marks.

    • Handling null values:

      • "" indicates an empty string.

      • null indicates a NULL value.

      • 'null' indicates the string "null".

    • expr(a+1) is an example of an expression.

    • 2.3 is a floating-point number.

    • true is a boolean value.

  • You must explicitly specify the columns to synchronize in the column parameter. The parameter cannot be empty.

Yes

None

splitPk

Specifies a column to use for data sharding. This enables parallel tasks to run during data extraction, which improves synchronization efficiency.

  • Use the table's primary key for the splitPk parameter. Primary key values are typically distributed evenly, which helps prevent data hot spots.

  • Currently, splitPk supports data sharding for integer types only. It does not support string, floating-point, date, or other data types. If you specify an unsupported data type, this feature is ignored and synchronization runs on a single thread.

  • If this parameter is not configured, data is synchronized on a single thread.

No

None

where

A filter condition for the source data. This is often used for incremental synchronization. For example, to synchronize only the current day's data, set the condition to gmt_create>$bizdate.

  • The WHERE clause allows for effective incremental data synchronization. If you do not specify a WHERE clause, or if you provide one without a key or value, the operation is treated as a full data synchronization.

  • The WHERE clause does not support the LIMIT syntax.

No

None