MySQL data source

更新时间:
复制 MD 格式

A MySQL data source enables bidirectional synchronization with MySQL. This topic describes how DataWorks supports this synchronization through Data Integration.

Supported MySQL versions

Limitations

Real-time synchronization

  • You cannot synchronize data from MySQL read-only instances running a version earlier than 5.6.x.

  • Synchronizing tables that contain functional indexes is not supported.

  • XA ROLLBACK is not supported.

    For transactions in the XA PREPARE state, real-time synchronization writes the data to the destination. If an XA ROLLBACK command is issued, real-time synchronization does not roll back the prepared data. To handle this scenario, you must manually remove the table from the real-time synchronization task and then add it back to resynchronize.

  • The binary log format of the MySQL server must be set to ROW.

  • Real-time synchronization does not synchronize cascading deletes in associated tables.

  • For Amazon Aurora MySQL databases, you must connect to your primary/writer instance. You cannot enable the Binlog feature on Aurora MySQL read replicas, and real-time synchronization tasks require Binlog for incremental updates.

  • For online DDL changes to MySQL tables, real-time synchronization only supports adding columns (Add Column) via Data Management Service (DMS).

  • Reading stored procedures from MySQL is not supported.

Offline synchronization

  • When you use the MySQL Reader plugin to synchronize multiple tables from a sharded database, you must set the concurrency to a value greater than the number of tables to enable table splitting. Otherwise, the system creates one task per table.

  • Reading stored procedures from MySQL is not supported.

Supported data types

For a complete list of MySQL data types in each version, see the official MySQL documentation. This table lists the support status for major data types, using MySQL 8.0.x as an example.

Type

Offline read (MySQL Reader)

Offline write (MySQL Writer)

Real-time read

Real-time write

TINYINT

Supported

Supported

Supported

Supported

SMALLINT

Supported

Supported

Supported

Supported

INTEGER

Supported

Supported

Supported

Supported

BIGINT

Supported

Supported

Supported

Supported

FLOAT

Supported

Supported

Supported

Supported

DOUBLE

Supported

Supported

Supported

Supported

DECIMAL/NUMERIC

Supported

Supported

Supported

Supported

REAL

Not supported

Not supported

Not supported

Not supported

VARCHAR

Supported

Supported

Supported

Supported

JSON

Supported

Supported

Supported

Supported

TEXT

Supported

Supported

Supported

Supported

MEDIUMTEXT

Supported

Supported

Supported

Supported

LONGTEXT

Supported

Supported

Supported

Supported

VARBINARY

Supported

Supported

Supported

Supported

BINARY

Supported

Supported

Supported

Supported

TINYBLOB

Supported

Supported

Supported

Supported

MEDIUMBLOB

Supported

Supported

Supported

Supported

LONGBLOB

Supported

Supported

Supported

Supported

ENUM

Supported

Supported

Supported

Supported

SET

Supported

Supported

Supported

Supported

BOOLEAN

Supported

Supported

Supported

Supported

BIT

Supported

Supported

Supported

Supported

DATE

Supported

Supported

Supported

Supported

DATETIME

Supported

Supported

Supported

Supported

TIMESTAMP

Supported

Supported

Supported

Supported

TIME

Supported

Supported

Supported

Supported

YEAR

Supported

Supported

Supported

Supported

LINESTRING

Not supported

Not supported

Not supported

Not supported

POLYGON

Not supported

Not supported

Not supported

Not supported

MULTIPOINT

Not supported

Not supported

Not supported

Not supported

MULTILINESTRING

Not supported

Not supported

Not supported

Not supported

MULTIPOLYGON

Not supported

Not supported

Not supported

Not supported

GEOMETRYCOLLECTION

Not supported

Not supported

Not supported

Not supported

Prerequisites

To ensure that data synchronization tasks run correctly, prepare your MySQL environment before you add it as a data source in DataWorks.

This topic describes the required prerequisites.

Verify the MySQL version

Data Integration supports only specific versions of MySQL. For more information, see the Supported MySQL versions section. You can check the version of your MySQL database by running the following command:

SELECT version();

Configure account permissions

We recommend creating a dedicated MySQL account for DataWorks to access the data source.

  1. Optional: Create an account.

    For detailed instructions, see Create a MySQL account.

  2. Grant permissions.

    • Batch

      In batch synchronization scenarios:

      • To read data from MySQL, the account must have the read (SELECT) permission on the tables that you want to synchronize.

      • To write data to MySQL, the account must have the write (INSERT, DELETE, and UPDATE) permissions on the tables that you want to synchronize.

    • Real-time

      In real-time synchronization scenarios, the account must have the SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions on the database.

    You can run the following commands to grant the required permissions. Alternatively, you can grant the account the SUPER permission. In the following command, replace 'sync_account' with the name of the account you created.

    -- Optional: Create a synchronization account that can connect from any host ('%').
    -- CREATE USER 'sync_account'@'%' IDENTIFIED BY 'your_password'; 
    -- Grant the required permissions to the synchronization account.
    GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sync_account'@'%';

    The *.* syntax grants the permissions on all tables in all databases. You can also grant the permissions on a specific table in a specific database. For example, to grant permissions on the user table in the test database, run the following command: GRANT SELECT, REPLICATION CLIENT ON test.user TO 'sync_account'@'%';.

    Note

    The REPLICATION SLAVE permission is a global permission and cannot be granted for a specific database or table.

Enable MySQL binlog (for real-time synchronization only)

Data Integration performs real-time incremental data synchronization by subscribing to MySQL binlogs. Before you configure a synchronization task in DataWorks, you must enable MySQL binlog.

Important
  • If a binlog is being consumed, the database cannot delete it. High latency in a real-time synchronization task can delay the deletion of the source binlog, which may exhaust disk space. We recommend that you configure latency alerts for your task and monitor the database disk space.

  • Retain binlogs for at least 72 hours. If a task fails and the binlogs are no longer available, you cannot reset the checkpoint to a position before the failure, which can cause data loss. In this case, you must use batch synchronization to backfill the data.

  1. Check whether binlog is enabled.

    • Run the following statement to check whether binlog is enabled.

      SHOW variables LIKE "log_bin";

      If the returned value is ON, binlog is enabled.

    • To check the binlog status on a replica database, run the following statement.

      SHOW variables LIKE "log_slave_updates";

      If the returned value is ON, binlog is enabled on the replica database.

    If the returned value is not as expected:

  2. Check the binlog format.

    Run the following statement to check the binlog format.

    SHOW variables LIKE "binlog_format";

    Possible returned values:

    • A return value of ROW indicates that the enabled binlog format is ROW.

    • STATEMENT: The binlog format is STATEMENT.

    • MIXED: The binlog format is MIXED.

    Important

    DataWorks real-time synchronization supports only the ROW format. If the returned value is not ROW, you must change the binlog_format setting.

  3. Check whether full row images are logged.

    Run the following statement to check the row image setting.

    SHOW variables LIKE "binlog_row_image";

    Possible returned values:

    • FULL: Full row images are logged.

    • MINIMAL: Minimal row images are logged instead of full row images.

    Important

    DataWorks real-time synchronization supports only MySQL servers that have full row images enabled for binlog. If the returned value is not FULL, you must modify the binlog_row_image setting.

Configure OSS binlog read permissions

When you add a MySQL data source, if you set Configuration Mode to ApsaraDB for RDS and your RDS for MySQL instance is in the same region as your DataWorks workspace, you can enable Read binlogs from OSS. With this feature enabled, DataWorks fetches binlogs from OSS if direct access to the RDS for MySQL binlogs fails, preventing interruptions to real-time synchronization tasks.

If you use a Alibaba Cloud RAM User or a Alibaba Cloud RAM Role for OSS binlog access identity, you must also grant the required permissions.

  • RAM user

    1. Log on to the Resource Access Management (RAM) console, go to the Users page, and find the target RAM user.

    2. In the Operations column, click Add Permissions.

    3. Configure the following parameters and click OK.

      • Scope: Alibaba Cloud account

      • Permission Policy: system policy

      • Policy Name: AliyunDataWorksAccessingRdsOSSBinlogPolicy

      image

  • RAM role

    1. Log on to the Resource Access Management (RAM) console and create a RAM role. For more information, see Create a RAM role for a trusted Alibaba Cloud account.

      Key parameters:

      • Select trusted entity: Alibaba Cloud account

      • Select account: Other Alibaba Cloud account. Enter the ID of the Alibaba Cloud account that owns the DataWorks workspace.

      • Role name: Enter a custom name.

    2. Grant permissions to the RAM role. For more information, see Grant permissions to a RAM role.

      Key parameters:

      • Permission Policy: system policy

      • Policy Name: AliyunDataWorksAccessingRdsOSSBinlogPolicy

    3. Modify the trust policy of the RAM role. For more information, see Edit the trust policy of a RAM role.

      {
          "Statement": [
              {
                  "Action": "sts:AssumeRole",
                  "Effect": "Allow",
                  "Principal": {
                      "Service": [
                          "<your_alibaba_cloud_account_id>@di.dataworks.aliyuncs.com",
                          "<your_alibaba_cloud_account_id>@dataworks.aliyuncs.com"
                      ]
                  }
              }
          ],
          "Version": "1"
      }

Add a data source

Alibaba Cloud instance mode

If your MySQL database runs on an Alibaba Cloud RDS instance, we recommend that you create the data source in Alibaba Cloud instance mode. Configure the parameters as follows:

Parameter

Description

Data Source Name

The data source name must be unique within the workspace. We recommend using a name that clearly identifies the business and environment, for example, rds_mysql_order_dev.

Configuration Mode

Select Alibaba Cloud instance mode. For more information about configuration modes, see Scenario 1: Instance Mode (Current Cloud Account) and Scenario 2: Instance Mode (Other Cloud Account).

Alibaba Cloud Account

Select the cloud account to which the instance belongs. If you select Another Alibaba Cloud Account, you must configure cross-account permissions. For more information, see Cross-Account Authorization (RDS, Hive, or Kafka).

If you select other cloud account, provide the following information:

  • Main account ID: The ID of the main account that owns the instance.

  • RAM role name: The RAM role provided by the other cloud account. This role must have the required permissions to access the target instance.

Region

The instance's region.

Instance

Select the name of the instance to connect to.

Standby library settings

If your RDS instance has a read-only instance (standby instance), you can configure tasks to read data from it. This prevents read operations from interfering with the performance of the primary instance.

Instance Address

After selecting the correct instance, click Get Latest Address to view information such as its public/private address, VPC, and VSwitch.

Database

The name of the database to which the data source connects. Ensure that the specified user has the required permissions to access this database.

Access identity

Select the credential source that DataWorks uses to access MySQL. The following options are supported:

  • Username and password: Enter the username and password for the MySQL database. DataWorks stores them with symmetric encryption. If you use an RDS instance, you can create and manage them in the account management section of the instance.

  • Key Management Service: Use a generic secret hosted in Alibaba Cloud Key Management Service (KMS). You must first create a generic secret in KMS. For more information, see Manage and use generic secrets. Then, in the data source, select the Kms Region where the KMS generic secret resides, and select the target generic secret from the KMS List. The content of a KMS generic secret supports the following two JSON formats:

    { "username": "biz_rw", "password": "S3cr3t!" }
    { "AccessKeyId": "LTAI...", "AccessKeySecret": "..." }
    Note

    After the content of a KMS generic secret is changed, DataWorks caches the secret for up to 5 minutes. The new secret takes effect within 5 minutes at the latest.

Support OSS binlog reading

If enabled, DataWorks attempts to fetch binlogs from OSS when RDS binlogs are inaccessible. This prevents interruptions to real-time synchronization tasks. For more information, see Configure authorization for reading OSS binlogs. Based on your authorization configuration, set the OSS binlog access identity.

Authentication Method

Select no authentication or SSL authentication. If you select SSL authentication, ensure it is also enabled on the instance. Prepare the certificate file and upload it to Authentication File Management.

Version

Log on to the MySQL server and run the SELECT VERSION() query to check the version number.

Connection string mode

You can also use the connection string mode to create a data source, which offers more flexibility. Configure the parameters as follows:

Parameter

Description

Data Source Name

The data source name must be unique within the workspace. We recommend using a name that clearly identifies the business and environment, for example, rds_mysql_order_dev.

Configuration Mode

Select User-created Data Store with Public IP Addresses. In this mode, a JDBC URL is used to connect to the database.

Connection string preview

After entering the connection address and database name, DataWorks automatically generates a preview of the JDBC URL.

Connection Address

Host address: Enter the IP address or domain name of the database. If the database is an Alibaba Cloud RDS instance, you can find the address on the Database Connection page in the instance details.

Port number: The database port. The default value is 3306.

Database Name

The name of the database to which the data source connects. Ensure that the specified user has the required permissions to access this database.

Access identity

Select the credential source that DataWorks uses to access MySQL. The following options are supported:

  • Username and password: Enter the username and password for the MySQL database. DataWorks stores them with symmetric encryption. If you use an RDS instance, you can create and manage them in the account management section of the instance.

  • Key Management Service: Use a generic secret hosted in Alibaba Cloud Key Management Service (KMS). You must first create a generic secret in KMS. For more information, see Manage and use generic secrets. Then, in the data source, select the Kms Region where the KMS generic secret resides, and select the target generic secret from the KMS List. The content of a KMS generic secret supports the following two JSON formats:

    { "username": "biz_rw", "password": "S3cr3t!" }
    { "AccessKeyId": "LTAI...", "AccessKeySecret": "..." }
    Note

    After the content of a KMS generic secret is changed, DataWorks caches the secret for up to 5 minutes. The new secret takes effect within 5 minutes at the latest.

Version

Log on to the MySQL server and run the SELECT VERSION() query to check the version number.

Authentication Method

Select no authentication or SSL authentication. If you select SSL authentication, ensure it is also enabled on the instance. Prepare the certificate file and upload it to Authentication File Management.

Advanced Parameters

Parameter: Select a supported parameter from the drop-down list, for example, connectTimeout.

Value: Enter an appropriate value for the selected parameter, for example, 3000.

The URL is then automatically updated to: jdbc:mysql://192.168.90.28:3306/test?connectTimeout=3000.

Important

Ensure the DataWorks resource group can connect to the database. Otherwise, subsequent tasks will fail. Network configuration depends on the database environment and data source connection mode. For more information, see Test connectivity.

MySQL synchronization workflow

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure single-table offline synchronization

Configure single-table real-time synchronization

See Real-time synchronization task configuration (Legacy) for the procedure.

Configure full-database synchronization

See Full database real-time synchronization task configuration for the procedure.

FAQ

For other common Data Integration issues, see FAQ about Data Integration.

Appendix: MySQL script examples and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Script mode configuration. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script examples

This topic provides configuration examples for a single table in a single database and for sharded tables.

Note

The comments in the following JSON examples are for demonstration purposes only. Remove the comments before you run the script.

  • Single table in a single database

    {
      "type": "job",
      "version": "2.0",// The version number.
      "steps": [
        {
          "stepType": "mysql",// The connector name.
          "parameter": {
            "column": [// The columns to read.
              "id"
            ],
            "connection": [
              {
                "querySql": [
                  "select a,b from join1 c join join2 d on c.id = d.id;"
                ],
                "datasource": ""// The data source name.
              }
            ],
            "where": "",// The filter condition.
            "splitPk": "",// The shard key.
            "encoding": "UTF-8"// The encoding format.
          },
          "name": "Reader",
          "category": "reader"
        },
        {
          "stepType": "stream",
          "parameter": {},
          "name": "Writer",
          "category": "writer"
        }
      ],
      "setting": {
        "errorLimit": {
          "record": "0"// The maximum number of allowed dirty data records.
        },
        "speed": {
          "throttle": true,// Set to true to enable rate limiting. If false, rate limiting is disabled and the 'mbps' parameter is ignored.
          "concurrent": 1,// The number of concurrent threads.
          "mbps": "12"// The rate limit. 1 mbps = 1 MB/s.
        }
      },
      "order": {
        "hops": [
          {
            "from": "Reader",
            "to": "Writer"
          }
        ]
      }
    }
  • Sharded tables

    Note

    The MySQL Reader can read data from multiple MySQL tables that have the same schema. In this context, "sharded tables" refers to a scenario where data from multiple source tables is written to a single destination table. To synchronize sharded tables at the database level, create a task in Data Integration and select the full database sharding feature.

    {
      "type": "job",
      "version": "2.0",
      "steps": [
        {
          "stepType": "mysql",
          "parameter": {
            "indexes": [
              {
                "type": "unique",
                "column": [
                  "id"
                ]
              }
            ],
            "envType": 0,
            "useSpecialSecret": false,
            "column": [
              "id",
              "buyer_name",
              "seller_name",
              "item_id",
              "city",
              "zone"
            ],
            "tableComment": "Test order table",
            "connection": [
              {
                "datasource": "rds_dataservice",
                "table": [
                  "rds_table"
                ]
              },
              {
                "datasource": "rds_workshop_log",
                "table": [
                  "rds_table"
                ]
              }
            ],
            "where": "",
            "splitPk": "id",
            "encoding": "UTF-8"
          },
          "name": "Reader",
          "category": "reader"
        },
        {
          "stepType": "odps",
          "parameter": {},
          "name": "Writer",
          "category": "writer"
        },
        {
          "name": "Processor",
          "stepType": null,
          "category": "processor",
          "copies": 1,
          "parameter": {
            "nodes": [],
            "edges": [],
            "groups": [],
            "version": "2.0"
          }
        }
      ],
      "setting": {
        "executeMode": null,
        "errorLimit": {
          "record": ""
        },
        "speed": {
          "concurrent": 2,
          "throttle": false
        }
      },
      "order": {
        "hops": [
          {
            "from": "Reader",
            "to": "Writer"
          }
        ]
      }
    }

Reader parameters

Parameter

Description

Required

Default

datasource

The name of the data source to read from. This name must match a data source configured in the code editor.

Yes

None

table

The name of the source table to synchronize. A Data Integration task can read data from only one logical table.

The following examples show how to use advanced patterns for the table parameter:

  • You can read from sharded tables by specifying a range. For example, 'table_[0-99]' reads from 'table_0', 'table_1', 'table_2', through 'table_99'.

  • If the numeric suffixes in your table names have a consistent length, for example, 'table_000', 'table_001', and 'table_002' up to 'table_999', you can configure it as '"table":["table_00[0-9]","table_0[10-99]","table_[100-999]"]'.

Note

The task reads the columns specified in the column parameter from all tables that match the pattern. The task fails if a matched table or a specified column does not exist.

Yes

None

column

A JSON array that specifies the columns to read from the source table. To select all columns, use ["*"].

  • Column pruning: You can export a subset of columns.

  • Column reordering: You can export columns in an order that is different from the table schema.

  • Constant values: You can configure constant values. The values must follow MySQL SQL syntax. Example: ["id","`table`","1","'mingya.wmy'","'null'","to_char(a+1)","2.3",true].

    • id is a regular column name.

    • `table` is a column name that is a reserved keyword.

    • 1 is an integer constant.

    • 'mingya.wmy' is a string constant. It must be enclosed in single quotation marks.

    • Handling null values:

      • " " represents a string containing a single space.

      • null represents a NULL value.

      • 'null' represents the literal string "null".

    • to_char(a+1) is a string conversion function.

    • 2.3 is a floating-point number.

    • true is a boolean value.

  • You must explicitly specify the columns in the column parameter. It cannot be empty.

Yes

None

splitPk

When MySQL Reader extracts data, if you specify the splitPk parameter, it indicates that you want to use the field represented by splitPk for data sharding. Data Integration therefore launches concurrent tasks to perform data synchronization, which improves synchronization efficiency.

  • For best performance, use the primary key of the table as the splitPk.

  • Currently, splitPk supports data splitting for integer types only. It does not support other types such as string, floating-point, or date. If you specify an unsupported data type, the splitPk feature is ignored, and a single channel is used for synchronization.

  • If you do not specify splitPk, that is, you do not provide splitPk or the splitPk value is empty, the data synchronization is treated as a single-channel synchronization for the table.

No

None

splitFactor

The sharding factor. This parameter controls how many shards are created. If concurrency is enabled, the data is split into concurrency × splitFactor shards. For example, if the concurrency is 5 and splitFactor is 5, the data is split into 25 shards that are processed by 5 concurrent threads.

Note

Recommended range: 1 to 100. A value that is too large may cause an out-of-memory (OOM) error.

No

5

where

A filter condition. In a typical scenario, to synchronize only the current day's data, you can set the where condition to gmt_create > $bizdate.

  • The where clause allows you to effectively perform incremental synchronization. If you do not specify the where statement, including not providing a key or value for where, the data synchronization is treated as a full data synchronization.

  • Do not use clauses like LIMIT 10, as they are not valid in a MySQL WHERE clause.

No

None

querySql (Available only in the code editor; not supported in the codeless UI.)

In some use cases, the where parameter is insufficient for describing the required filtering conditions. You can use this parameter to define a custom SQL query for filtering. After this parameter is configured, the data synchronization system ignores the tables, columns, and splitPk parameters and directly uses the content of this parameter to filter data. For example, if you need to synchronize data after a multi-table join, use select a,b from table_a join table_b on table_a.id = table_b.id. When you configure querySql, MySQL Reader directly ignores the configurations of the table, column, where, and splitPk conditions. The querySql parameter has a higher priority than the table, column, where, and splitPk options. The datasource is used to parse information such as the username and password.

Note

The querySql parameter is case-sensitive. For example, querysql (lowercase) will not take effect.

No

None

useSpecialSecret

Specifies whether to use the password of each individual data source when you have multiple source data sources. Valid values:

  • true

  • false

If you configure multiple source data sources that have different usernames and passwords, set this parameter to true to use the credentials of each respective data source.

No

false

Writer script example

{
  "type": "job",
  "version": "2.0",// The version number.
  "steps": [
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "mysql",// The connector name.
      "parameter": {
        "postSql": [],// The SQL statements to run after the task.
        "datasource": "",// The data source name.
        "column": [// The columns to write to.
          "id",
          "value"
        ],
        "writeMode": "insert",// The write mode. Valid values: insert, replace, and update.
        "batchSize": 1024,// The number of records per batch.
        "table": "",// The destination table name.
        "nullMode": "skipNull",// The policy for handling NULL values.
        "skipNullColumn": [// The columns for which to skip NULL values.
          "id",
          "value"
        ],
        "preSql": [
          "delete from XXX;"// The SQL statements to run before the task.
        ]
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {// The maximum number of allowed dirty data records.
      "record": "0"
    },
    "speed": {
      "throttle": true,// Set to true to enable rate limiting. If false, rate limiting is disabled and the 'mbps' parameter is ignored.
      "concurrent": 1,// The number of concurrent threads.
      "mbps": "12"// The rate limit in MB/s to prevent excessive I/O pressure on the databases. 1 mbps = 1 MB/s.
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Writer parameters

Parameter

Description

Required

Default

datasource

The name of the destination data source. This name must match a data source configured in the code editor.

Yes

None

table

The name of the destination table.

Yes

None

writeMode

The write mode. The supported options correspond to the MySQL INSERT INTO, ON DUPLICATE KEY UPDATE, and REPLACE INTO statements:

  • insert: If a primary key or unique index conflict occurs, the task does not write the conflicting rows and records them as dirty data.

    If you configure the task in script mode, set writeMode to insert.

  • on duplicate key update: If no conflict with a primary key or unique index occurs, the statement behaves the same as insert into. If a conflict occurs, the statement updates the values of the specified fields with the values from the new row and writes the data to MySQL.

    If you configure the task in script mode, set writeMode to update.

  • replace into: When no conflict with a primary key or unique index occurs, it behaves the same as insert into. If a conflict occurs, it first deletes the original row and then inserts the new row. This means that the new row replaces all fields of the original row.

    If you configure the task in script mode, set writeMode to replace.

No

insert

nullMode

Specifies the policy for handling NULL values from the source. Valid values:

  • writeNull: If a source field is NULL, a NULL value is written to the destination field.

  • skipNull: If a source field is NULL, the destination field is omitted from the write operation, allowing the database's default value to be applied. If no default value is defined, the result is typically NULL. You must specify the applicable columns in the skipNullColumn parameter.

Important

When configured as skipNull, the task dynamically constructs SQL statements for writing data to support default values at the destination. This increases the number of FLUSH operations and reduces synchronization speed. In the worst-case scenario, the task will FLUSH once for each data record.

No

writeNull

skipNullColumn

When nullMode is set to skipNull, the columns specified by this parameter are not forced to NULL. Instead, their respective default values are used if available.

Format: ["c1","c2",...]. The specified columns such as c1 and c2 must be a subset of the columns defined in the column parameter.

No

All columns configured for the task.

column

The destination columns to write to, specified as a JSON array of strings. Example: "column":["id","name","age"]. To write to all columns in their schema order, use an asterisk (*). Example: "column":["*"].

Yes

None

preSql

One or more SQL statements to execute before the synchronization task starts. The codeless UI supports only one statement, while the code editor supports multiple. For example, you can clear existing data from a table: TRUNCATE TABLE tablename.

Note

Transactions are not supported for multiple SQL statements.

No

None

postSql

One or more SQL statements to execute after the synchronization task completes. The codeless UI supports only one statement, while the code editor supports multiple. For example, you can add a timestamp column by using the following statement: ALTER TABLE tablename ADD colname TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP.

Note

Transactions are not supported for multiple SQL statements.

No

None

batchSize

The number of records per write batch. Larger values reduce network round-trips and can improve throughput, but setting the value too high may cause an out-of-memory (OOM) error.

No

256

updateColumn

When writeMode is set to update, this parameter specifies which columns to update when a primary key or unique index conflict occurs. Specify the columns in a JSON array. Example: "updateColumn":["name","age"].

No

None