PolarDB-X 2.0

更新时间:
复制 MD 格式

DataWorks Data Integration supports PolarDB-X 2.0 as both a source and destination for offline (batch) synchronization tasks. This page covers supported capabilities, prerequisites, and script parameter reference for PolarDB-X 2.0 Reader and Writer.

Setup overview

To synchronize data between PolarDB-X 2.0 and other systems, complete these steps:

  1. Confirm you are using PolarDB-X 2.0 (not PolarDB-X 1.0).

  2. Grant the required permissions to the database account DataWorks will use.

  3. Add the PolarDB-X 2.0 data source in DataWorks.

  4. Configure and run an offline synchronization task.

Supported versions

Offline read and write: PolarDB-X 2.0. Offline synchronization can also read data from views.

Limits

PolarDB-X 2.0 data sources support serverless resource groups (recommended) and exclusive resource groups for Data Integration.

Supported field types

For a complete list of PolarDB-X 2.0 field types, see Data types. The table below lists the major field types and their support status.

Field type

Offline read (PolarDB-X 2.0 Reader)

Offline write (PolarDB-X 2.0 Writer)

TINYINT

Supported

Supported

SMALLINT

Supported

Supported

INTEGER

Supported

Supported

BIGINT

Supported

Supported

FLOAT

Supported

Supported

DOUBLE

Supported

Supported

DECIMAL/NUMERIC

Supported

Supported

REAL

Not supported

Not supported

VARCHAR

Supported

Supported

JSON

Supported

Supported

TEXT

Supported

Supported

MEDIUMTEXT

Supported

Supported

LONGTEXT

Supported

Supported

VARBINARY

Supported

Supported

BINARY

Supported

Supported

TINYBLOB

Supported

Supported

MEDIUMBLOB

Supported

Supported

LONGBLOB

Supported

Supported

ENUM

Supported

Supported

SET

Supported

Supported

BOOLEAN

Supported

Supported

BIT

Supported

Supported

DATE

Supported

Supported

DATETIME

Supported

Supported

TIMESTAMP

Supported

Supported

TIME

Supported

Supported

YEAR

Supported

Supported

LINESTRING

Not supported

Not supported

POLYGON

Not supported

Not supported

MULTIPOINT

Not supported

Not supported

MULTILINESTRING

Not supported

Not supported

MULTIPOLYGON

Not supported

Not supported

GEOMETRYCOLLECTION

Not supported

Not supported

Prerequisites

Before you begin, ensure that you have:

  • Confirmed you are running PolarDB-X 2.0. For PolarDB-X 1.0, use the DRDS data source instead.

  • A PolarDB-X 2.0 account with the permissions described below.

Grant account permissions

Create a dedicated PolarDB-X 2.0 account for DataWorks access, then grant the appropriate permissions based on your synchronization scenario.

Offline read (SELECT permission on source table)

The account must have the SELECT permission on the source table.

Offline write (write permissions on destination table)

The account must have INSERT, DELETE, and UPDATE permissions on the destination table.

Real-time synchronization — full database (binary logging access)

  • Privileged account: Can read binary logging (binlog) data by default.

  • Standard account: Grant SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions using a privileged account:

-- Create a sync account and allow login from any host (% represents any host)
-- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'password';

-- Grant permissions for real-time (CDC) synchronization
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sync_account'@'%';

Add a data source

Add the PolarDB-X 2.0 data source to DataWorks before configuring any synchronization task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Configure an offline synchronization task

For the entry point and configuration procedure, see Configure an offline sync task in the code editor.

For the script format and all available parameters, see Appendix: Script demo and parameter descriptions below.

Appendix: Script demo and parameter descriptions

Use the code editor to configure batch synchronization tasks in JSON format. For the unified script format requirements, see Configure a task in the code editor.

All examples use "type": "job" and "version": "2.0" at the top level.

Reader script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "polardbx20",
            "parameter": {
                "connection": [
                    {
                        "datasource": "",
                        "table": [
                            "t1"
                        ]
                    }
                ],
                "column": [
                    "c1",
                    "c2",
                    "'const'"
                ],
                "where": "",
                "splitPk": "",
                "checkSlave": "true",
                "slaveDelayLimit": "300"
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Reader script parameters

Parameter

Description

Required

Default

datasource

The data source name. Must match the name configured on the Data Source Management page.

Yes

None

table

The table to synchronize. Only a single table is supported per connection block.

Yes

None

column

The columns to synchronize, as a JSON array. Use ["*"] to include all columns. Cannot be blank. Supports column pruning (select specific columns), column reordering (order need not match the table schema), and constants following PolarDB-X 2.0 SQL syntax. Example: ["id", "table", "1", "'mingya.wmy'", "'null'", "to_char(a+1)", "2.3", "true"].

Yes

None

splitPk

The column to use for data partitioning, enabling concurrent reads. Set to the primary key for balanced shards. Supports integer-type columns only — string, floating-point, and date columns are ignored, and data falls back to a single channel. If blank or omitted, data is read through a single channel.

No

None

where

A SQL WHERE filter condition for incremental synchronization. For example, gmt_create>$bizdate synchronizes only the current day's data. Cannot be set to LIMIT 10. If omitted, all data is synchronized.

No

None

checkSlave

When the data source is a read-only instance, checks replication lag before the task starts to prevent data loss.

No

true

slaveDelayLimit

The maximum allowed replication lag in seconds. If the actual lag exceeds this value, the task fails.

No

30

Writer script demo

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "PolarDB-X 2.0",
            "parameter": {
                "postSql": [],
                "datasource": "",
                "column": [
                    "id",
                    "value"
                ],
                "writeMode": "insert",
                "batchSize": 1024,
                "table": "",
                "preSql": [
                    "delete from XXX;"
                ]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {
            "record": "0"
        },
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default

datasource

The data source name. Must match the name configured on the Data Source Management page.

Yes

None

table

The destination table name.

Yes

None

column

The destination columns to write to, as a JSON array. Example: ["id", "name", "age"]. Use ["*"] to write to all columns in schema order.

Yes

None

writeMode

The write conflict mode. Set to insert (insert into) or replace (replace into). See Write modes below.

No

insert

preSql

SQL statement(s) to run before the task starts — for example, truncate table tablename;. In the codeless UI, only one statement is allowed. In the code editor, multiple statements are supported. Transactions are not supported for multiple statements.

No

None

postSql

SQL statement(s) to run after the task completes — for example, adding a timestamp column. In the codeless UI, only one statement is allowed. In the code editor, multiple statements are supported. Transactions are not supported for multiple statements.

No

None

batchSize

The number of records submitted per batch. A larger value reduces network round trips and improves throughput, but may cause memory overflow if set too high.

No

256

Write modes

Mode

Script value

Behavior on conflict

insert into

insert

If a primary key or unique index conflict occurs, the conflicting row is skipped and recorded as dirty data.

replace into

replace

If no conflict occurs, behaves the same as insert into. If a conflict occurs, the existing row is deleted and the new row is inserted, replacing all fields.

Job-level settings

Parameter

Description

Default

errorLimit.record

The number of error records allowed before the task fails.

"0"

speed.throttle

Whether to apply a rate limit. Set to true to enable; false disables the rate limit and the mbps parameter has no effect.

true

speed.concurrent

The number of concurrent channels.

1

speed.mbps

The maximum synchronization rate in Mbps. Controls read/write pressure on the source and destination. Takes effect only when throttle is true.

"12"