The batch database synchronization feature in DataWorks lets you periodically synchronize the schema and data of some or all tables from a source database to a destination. It supports full, incremental, or combined synchronization modes. This topic uses the migration of a MySQL database to MaxCompute as an example to describe how to configure these tasks.
Prerequisites
-
Data sources
-
Create the source and destination data sources. For more information, see Data Source Management.
-
Ensure your data sources support batch database synchronization. For more information, see Supported data sources.
-
-
Resource group: Purchase and configure a serverless resource group.
-
Network connectivity: Establish network connectivity between the resource group and the data sources.
Applicability
You can configure batch database synchronization tasks in both DataStudio and Data Integration. The features are identical in both modules.
-
Consistent configuration: The configuration interface, parameter settings, and underlying functions are identical whether you create the task in DataStudio or Data Integration.
-
Bidirectional synchronization: Data Integration automatically synchronizes tasks to and displays them in the
data_integration_jobsdirectory in DataStudio. These tasks are categorized by theSourceType-DestinationTypechannel for unified management.
Configure a synchronization task
Step 1: Create a synchronization task
-
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Integration.
-
In the left-side navigation pane, click Synchronization Task, and then click Create Synchronization Task at the top of the page. Configure the task information:
-
Data Source Type:
MySQL. -
Data Source Type:
MaxCompute. -
Specific Type:
batch database synchronization. -
Synchronization Mode: The Synchronization steps setting works with the subsequent full and incremental control setting. You can combine them to create different synchronization solutions. For details, see Full and incremental control.
-
Schema Migration: Automatically creates database objects like tables, fields, and data types in the destination to match the source schema, but does not migrate data.
-
Full Synchronization (optional): Copies all historical data from specified source objects, such as tables, to the destination in a single operation. This is typically used for initial data migration or data initialization.
-
Incremental Sync (optional): After a full synchronization is complete, this mode continuously captures new data from the source based on the incremental condition and synchronizes it to the destination.
-
-
Step 2: Configure data sources and runtime resources
-
In the Source Data Source section, select the
MySQLdata source in your workspace. In the Destination section, select the addedMaxComputedata source. -
In the Running Resources section, select the Resource Group for the synchronization task and allocate Resource Group CUs. If your synchronization task fails due to an out-of-memory (OOM) error, increase the allocated Resource Group CUs.
-
Ensure that both the source and destination data sources pass the Connectivity Check.
Step 3: Configure the synchronization solution
1. Data source
In this step, select the tables to synchronize from the source data source in the Source tables section. Click the
icon to move them to the Selected tables list on the right.

If there are many databases or tables, you can use Database Filtering or Table filtering and configure regular expressions to select the tables you want to synchronize.
2. Destination
Click the Configuration button next to Partition Initialization Configuration to uniformly initialize partitions for new destination tables. Changes made here apply to all new destination tables and do not affect existing tables.
3. Full and incremental control
-
Configure the task execution frequency.
-
If you select "Full synchronization" or "Incremental synchronization", you can choose to run the task as a One-time task or a Recurring task.
-
If you select both "Full synchronization" and "Incremental synchronization", the system uses a built-in mode: "One-time full synchronization for the first run, followed by recurring incremental synchronization". This option cannot be changed.
Synchronization steps
Full and incremental control
Behavior
Use case
Full synchronization
One-time
After the task starts, it synchronizes all data from the source table once to the destination table or a specified partition.
Data initialization, system migration
Recurring
Based on the configured scheduling cycle, it periodically synchronizes all data from the source table to the destination table or a specified partition.
Data reconciliation, T+1 full snapshots
Incremental synchronization
One-time
After the task starts, it synchronizes incremental data once to a specified partition based on the incremental condition you define.
Manual repair of a specific data batch
Recurring
After the task starts, it periodically synchronizes incremental data to a specified partition based on the configured scheduling cycle and incremental condition.
Daily ETL, building slowly changing dimension tables
Full synchronization & Incremental synchronization
(Built-in mode, not selectable)
First run: Automatically performs a one-time schema migration and full synchronization of historical data.
Subsequent runs: Periodically synchronizes incremental data to a specified partition based on the configured scheduling cycle and incremental condition.
One-click data ingestion into a data warehouse or data lake
Note-
For batch database synchronization, the system generates instances for recurring tasks in the same way as for tasks set to Start immediately after deployment. For more information, see Instance generation method: Start immediately after deployment.
-
You can define the partition generation method in the subsequent Value assignment step. You can use constants or dynamically generate partitions using system-predefined variables and scheduling parameters.
-
The configurations for the scheduling cycle, incremental condition, and partition generation method are interdependent. For more information, see 6. Incremental condition.
-
-
-
Configure scheduling parameters.
If your task involves recurring synchronization, click Scheduling Parameters for Periodical Scheduling to configure them. These parameters can be used later when configuring the incremental condition and field value assignment in the destination table mapping.
4. Destination table mapping
In this step, define the mapping rules between source and destination tables, and specify the data writing method by defining the recurring configuration and incremental condition.

|
Actions |
Description |
||||||||||||
|
Refresh mapping |
The system automatically lists the source tables you selected, but you must refresh the mapping to apply the destination table properties.
|
||||||||||||
|
Edit field type mapping (optional) |
The system has a default mapping between source and destination field types. You can click Edit Mapping of Field Data Types in the upper-right corner to customize the mapping. After configuration, click Apply and Refresh Mapping. When editing field type mappings, ensure that the conversion rules are correct. Otherwise, type conversion may fail, resulting in dirty data and affecting task execution. |
||||||||||||
|
Customize destination table name mapping (optional) |
The system has a default table name generation rule:
This feature supports the following scenarios:
|
||||||||||||
|
Customize destination database name mapping (optional) |
Some destination data sources, such as Hologres, support defining mapping rules for destination databases. The configuration method is the same as for Customize destination table name mapping. |
||||||||||||
|
Customize destination schema name mapping (optional) |
Some destination data sources, such as Hologres, support defining mapping rules for destination schemas. The configuration method is the same as for Customize destination table name mapping. |
||||||||||||
|
Edit destination table schema (optional) |
The system automatically generates the destination table schema based on the source table schema. You do not usually need to modify it. For special requirements, you can customize it as follows:
|
||||||||||||
|
Value assignment for destination table fields |
You must manually assign values to partition fields and any newly added fields. Standard fields are automatically mapped based on matching names in the source and destination tables. Follow these steps:
You can assign constants or variables. In the Value Type for Table fields and Partition fields, you can switch between the following types:
Both variables and scheduling parameters are automatically replaced with date-specific values when the task is scheduled to run. |
||||||||||||
|
Set source sharding column |
In the source sharding column, you can select a field from the source table or choose Disable. When the synchronization task runs, DataWorks splits the task into multiple subtasks based on this field to enable concurrent, batched data reading. It is recommended to use the table's primary key as the source sharding column. String, floating-point, and date types are not supported. Currently, the source sharding column is supported only for MySQL sources. |
||||||||||||
|
Customize advanced parameters |
Modify these parameters only if you fully understand them. Incorrect settings can cause task delays, resource contention, or data loss. You can set individual writer-side configurations and runtime configurations for subtasks. |
||||||||||||
|
Table type |
Table Type: MaxCompute supports standard tables, PK Delta Tables, and Append Delta Tables. If the destination table status is "To be created", you can select the table type when editing the destination table schema. The type of an existing table cannot be changed. For more information about Delta Tables, see Delta Table. |
5. Recurring configuration
If incremental synchronization is configured as Recurring, you need to complete the recurring configuration for the destination table. This includes settings like Scheduling Frequency, Data Timestamp, and Resource Group for Scheduling. The scheduling configuration for this synchronization task is consistent with the node scheduling configuration in DataStudio. For parameter details, see Configure node scheduling.
If a one-time synchronization involves a large number of tables, we recommend staggering their execution times in the schedule to prevent task buildup and resource contention.
6. Incremental condition
If the task needs to synchronize incremental data, you must configure an incremental condition. This condition determines which data each scheduled instance synchronizes.
-
Function and syntax
-
Function: The incremental condition is essentially a
WHEREclause that filters the source data. -
Syntax: When configuring, you only need to enter the conditional expression that follows the
WHEREkeyword. Do not include theWHEREkeyword itself.
-
-
Use scheduling parameters for incremental synchronization
To achieve periodic incremental synchronization, you can use scheduling parameters in the incremental condition. For example, configuring the condition as
synchronizes data generated on the previous day.STR_TO_DATE('${bizdate}', '%Y%m%d') <= columnName AND columnName < DATE_ADD(STR_TO_DATE('${bizdate}', '%Y%m%d'), INTERVAL 1 DAY) -
Write to a specific partition
By combining the incremental condition with the destination table's partition field, you can ensure that each batch of incremental data is written to the correct partition.
For example, using the incremental condition from the previous step, you can set the partition field to
ds=${bizdate}and configure the destination table to be partitioned by day. This way, each daily instance synchronizes only the data from the corresponding date in the source and writes it to the matching partition in the destination table.
By properly combining the time range specified by the incremental condition, the time interval for partition generation, and the scheduling cycle from the recurring configuration, you can create an automated T+n incremental ETL pipeline where business rules and physical partitions are strictly aligned.
Step 4: Configure advanced settings
Advanced parameters
If you need to fine-tune the task to meet custom synchronization requirements, you can modify the advanced parameters.
-
Click Advanced configuration in the upper-right corner to go to the advanced parameter configuration page.
-
Modify the parameter values based on the tooltips. The meaning of each parameter is explained next to its name.
-
You can also use AI-assisted configuration by entering natural language instructions, such as adjusting task concurrency. The AI model generates recommended parameter values, which you can then review and accept.

Modify these parameters only if you fully understand them. Incorrect settings can cause task delays, resource contention, or data loss.
Engine parameters
You do not normally need to configure engine parameters. If changes are required, use them only with guidance from technical support.
Step 5: Deploy and run the task
-
After completing all configurations, click Save at the bottom of the page to finish configuring the task.
-
You cannot debug batch synchronization tasks directly. You must Deploy them to Operation Center to run them. Therefore, any new or edited tasks must be deployed for the changes to take effect.
-
During deployment, if you select Start immediately after deployment, the task starts automatically. Otherwise, after deployment is complete, go to the page and manually start the task from the Actions column.
-
Click the Name/ID of the task in the Tasks to view the detailed execution process.
Step 6: Configure alert rules
You must configure alert rules for batch database synchronization tasks on their corresponding subtasks in Operation Center.
-
On the page, get the Task ID of the target task.
-
In the column, find the corresponding full-database offline subtask based on the ID of the integration task (for example, if the integration task ID is
34862, the incremental synchronization auto triggered subtask is namedoffline_odps_cyc_sync_mysql_test_timetest_to_mysql_test_timetest_34862), and click to go to the Rule Management page. -
Click Create Custom Rule and set the Rule Object, Trigger Method, and Alert Details. For more information, see Rule Management.
You can search for the subtask ID you obtained in the Rule Object to find the target task and set an alert for it.
Manage synchronization tasks
Edit a task
-
On the page, find the created synchronization task. In the Operation column, click More, then click Edit to modify the task information. The steps are the same as for task configuration.
-
For a task that is not running, you can directly modify the configuration, save it, and deploy it for the changes to take effect.
-
For a task that is running, if you edit and deploy it without selecting Start immediately after deployment, the original action button changes to Apply Updates. You must click this button for the changes to take effect in the production environment.
-
After you click Apply Updates, the system performs three steps on the changes: Stop, Deploy, and Restart.
-
If you add a table:
After you click Apply Updates, a synchronization subtask for the new table is added. The task's schema migration and one-time full synchronization start immediately, followed by incremental synchronization according to the schedule.
-
If you switch the destination table (which is equivalent to deleting the old table and adding a new one):
After you click Apply Updates, the old table's subtask is deleted, a new subtask is created, and the new task's schema migration and one-time full synchronization start immediately. The new task then performs incremental synchronization according to the schedule.
-
If other information is modified:
The table's schema migration and one-time full synchronization remain unchanged. Newly generated instances for incremental synchronization will use the updated configuration, while already generated instances are not affected.
Unmodified tables are not affected and will not be re-run.
-
View tasks
After creating a synchronization task, you can view the list of created tasks and their basic information on the synchronization task page.

-
In the Actions column, you can Start or Stop the synchronization task. In the More menu, you can perform operations such as Edit and View.
-
For a running task, you can see the basic status in the Execution Overview. You can also click the corresponding overview area to view execution details.

Quotas and limitations
-
Debugging limitation: You cannot debug batch synchronization tasks directly in Data Integration or DataStudio. You must deploy them to Operation Center to run them.
-
Sharding column limitation: The Set source sharding column feature currently supports only MySQL sources. The sharding column field must be of a numeric type; string, floating-point, or date types are not supported.
-
Schema change limitation: When you edit the destination table schema, renaming columns is not supported. For an existing destination table, you cannot change its Table type.
Next steps
After the task starts, you can click the task name to view its run details and perform operations, maintenance (O&M), and tuning.
FAQ
For frequently asked questions about batch database synchronization tasks, see FAQ about full and incremental synchronization tasks.
button and combining Manually enter and Built-in Variable. Supported variables include the source data source name, source database name, and source table name.





button in the Target Table column to add a field.
tooltip on the interface.
