The real-time full-database synchronization feature combines a one-time full migration with continuous incremental capture to synchronize an entire source database, such as MySQL or Oracle, to a destination system with low latency. This task supports full synchronization of historical data from the source database and automatically initializes the schema and data for the destination table. The task then automatically switches to a real-time incremental mode, using technologies like change data capture (CDC) to continuously capture and synchronize subsequent changes. This feature is suitable for scenarios such as building real-time data warehouses and data lakes. This topic uses an example of synchronizing data from a MySQL database to MaxCompute in real time to describe how to configure a synchronization task.
Prerequisites
Data source preparation
Create a source data source and a destination data source. For more information, see Data source management.
Ensure your data sources support real-time full-database synchronization. For more information, see Supported data sources and synchronization solutions.
You must enable logging for certain data sources, such as MySQL, Hologres, and Oracle. The method varies by data source. For more information, see Data source list.
MaxCompute: The Decimal type is supported only in MaxCompute 2.0. Before you perform synchronization, you must enable MaxCompute 2.0 data types. For more information, see MaxCompute V2.0 data type edition.
Resource group: You must have a configured serverless resource group.
Network connectivity: Configure network connectivity between the resource group and data sources.
Usage notes
DataWorks supports two types of full-database synchronization: real-time full-database synchronization and full & incremental (near real-time). Both types can perform full synchronization for historical data in a source database and then automatically switch to a real-time incremental mode. However, the two types differ in latency and destination table requirements:
Timeliness: The real-time full-database synchronization feature provides a latency of seconds to minutes. The full & incremental (near real-time) feature provides a T+1 timeliness.
Destination table (MaxCompute):
PK Delta Table: All features of real-time full-database synchronization are supported.
Regular table and Append Delta Table: Only the Append mode is supported when you select incremental synchronization mode in a real-time full-database synchronization task.
Full & incremental (near real-time): All the preceding table types are supported.
Real-time full-database synchronization tasks can be configured in DataStudio (Data Studio) and Data Integration. The two modules are functionally interoperable.
Consistent configuration: Whether you create a task in Data Studio or the Data Integration module, the configuration interface, parameter settings, and underlying features are identical.
Bidirectional synchronization: Tasks created in the Data Integration module are automatically synchronized and displayed in the
data_integration_jobsdirectory in the Data Studio module. These tasks are categorized by channels in the format ofsource type-destination typefor unified management.
Configure a task
Step 1: Create a synchronization task
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Integration.
In the left-side navigation pane, click Synchronization Task, click Create Synchronization Task at the top of the page, and configure the task information:
Source Type:
MySQL.Destination Type:
MaxCompute.Specific Type:
Real-time Full-database.Synchronization Mode:
Schema Migration: Automatically creates database objects (such as tables, columns, and data types) that match the source on the destination, without including data.
Full Synchronization (optional): Performs a one-time copy of all historical data from the specified objects (such as tables) on the source to the destination. This is typically used for initial data migration or data initialization.
Incremental Sync (optional): After full synchronization is complete, continuously captures change data (inserts, updates, and deletes) from the source and synchronizes it to the destination.
Step 2: Configure data sources and running resources
In the Source Data Source section, select the
MySQLdata source that has been added to the workspace. In the Destination section, select theMaxComputedata source that has been added.In the Running Resources section, select the Resource Group for the synchronization task, and allocate Resource Group CU to the task.
NoteIf the task logs display a message such as
Please confirm whether there are enough resources..., it indicates that the available compute units (CUs) in the current resource group are insufficient for starting or running the task. You can increase the number of CUs allocated to the task in the Configure Resource Group panel to allocate more compute resources.For recommended resource size values, see Recommended CUs for Data Integration. Adjust the values based on your actual requirements.
Ensure that both the source data source and the destination data source pass the Connectivity Check.
Step 3: Configure the synchronization solution
1. Configure the data source
In this step, you can select the tables to synchronize from the source data source in the Source Tables section, and click the
icon to move them to the Selected Tables section on the right. If there are many tables, you can use Database Filtering or Table filtering to select the tables to synchronize by configuring regular expressions.
To write data from multiple sharded tables (with the same schema) to a single destination table, you can Select Tables by Regex.

Enter a regular expression in the source table configuration. DataWorks automatically identifies and collects all matching source tables and writes their data to the destination table mapped by the expression.NoteThis method is applicable to sharded table merge synchronization scenarios (similar to sharding-based synchronization), improving configuration efficiency and avoiding the need to repeatedly add many-to-one synchronization rules.
2. Configure the data destination
If only Incremental Sync is selected for the real-time full-database synchronization task, you can configure the incremental synchronization mode for writing to the destination table.
Replay: Only PK Delta Table is supported. Similar to normal synchronization, only data columns are synchronized.
Append: Regular tables and Append Delta Tables are supported. Source table real-time data is written to the destination table with appended metadata such as insert, update, and delete operations. For the format of the append log table, see Append log table format.
3. Destination table mapping
In this step, you must define the mapping rules between source tables and destination tables, and specify the primary key, dynamic partition, DDL/DML configuration, and other rules to determine how data is written.
Operation | Description | ||||||||||||
Refresh | The system automatically lists the source tables you selected, but the specific properties of the destination tables take effect only after you refresh and confirm them.
| ||||||||||||
Customize Mapping Rules for Destination Table Names (optional) | The system has a default table name generation rule:
The following scenarios are supported:
| ||||||||||||
Edit column type mapping (optional) | The system has default column type mappings between source and destination. You can click Edit Mapping of Field Data Types in the upper-right corner of the table to customize the column type mapping between source and destination tables, and then click Apply and Refresh Mapping. When editing column type mappings, make sure the type conversion rules are correct. Incorrect rules may cause type conversion failures that produce dirty data and affect task execution. | ||||||||||||
Edit destination table schema (optional) | The system automatically creates destination tables that do not exist based on the custom table name mapping rules, or reuses existing tables with the same name. DataWorks automatically generates the destination table schema based on the source table schema. Manual intervention is not required in most cases. You can also modify the table schema in the following ways:
For existing tables, you can only add columns. For new tables, you can add columns and partition columns, and set the table type or table properties. For details, see the editable areas on the interface. | ||||||||||||
Value assignment | Native columns are automatically mapped based on columns with the same name in the source and destination tables. The newly added columns and partition columns from the preceding steps must be manually assigned values. Perform the following operations:
When assigning values, you can assign constants and variables. Switch the type in Value Type. The following methods are supported:
Note Too many partitions can affect synchronization efficiency. If more than 1,000 new partitions are created per day, partition creation fails and the task is terminated. Therefore, when defining partition column value assignment methods, estimate the number of partitions that may be generated. Use second-level and millisecond-level partition creation methods with caution. | ||||||||||||
Source Split Key | You can select a column from the source table or select Not Split in the source split key drop-down list. During task execution, data is split into multiple tasks based on this column to enable concurrent, batch data reads. We recommend that you use the table primary key as the source split key. String, floating-point, date, and other types are not supported. Currently, the source split key is supported only when the source is MySQL. | ||||||||||||
Skip Full Synchronization | If you have configured full synchronization in Step 3, you can individually skip full data synchronization for specific tables. This is applicable to scenarios where full data has already been synchronized to the destination through other methods. | ||||||||||||
Full condition | Apply conditional filtering to the source during the full synchronization phase. You only need to write the WHERE clause here, without the WHERE keyword. | ||||||||||||
Configure DML Rule | DML message processing is used to perform fine-grained filtering and control on change data captured from the source ( | ||||||||||||
Others | Table Type: MaxCompute supports regular tables,
For more information about Delta Table, see Delta Table overview. |
Step 4: Advanced configuration
Advanced parameter configuration
To fine-tune the task configuration and achieve custom synchronization requirements, go to the Advanced Parameters tab and modify the advanced parameters.
Click Advanced Configuration in the upper-right corner of the page to go to the advanced parameter configuration page.
Modify the parameter values based on the parameter descriptions. The meaning of each parameter is described after the parameter name.
AI-assisted configuration is also supported. Enter natural language instructions, such as adjusting the task concurrency. The large language model generates recommended parameter values. You can decide whether to accept the AI-generated parameters based on your actual requirements.

Modify parameters only when you fully understand their meanings. Incorrect modifications may cause unexpected issues such as task latency, excessive resource consumption that blocks other tasks, and data loss.
DDL capability configuration
Some real-time synchronization channels can detect metadata changes in the source table schema and notify the destination to synchronize the updates, or take other actions such as alerting, ignoring, or terminating the task.
You can click Configure DDL Capability in the upper-right corner to set the processing policy for each type of change. The supported processing policies vary by channel.
Normal processing: The destination processes the DDL change information from the source.
Ignore: The change message is ignored, and the destination is not modified.
Error: The real-time full-database synchronization task is terminated, and the status is set to Error.
Alert: An alert is sent to you when this type of change occurs on the source. You must configure a DDL notification rule in Configure Alert Rule.
After a new column is added on the source and created on the destination through DDL synchronization, the system does not backfill data for the existing data in the destination table.
Step 5: Deploy and run the task
After you complete all configurations, click Save at the bottom of the page to save the task configuration.
Full-database synchronization tasks do not support direct debugging. You must deploy them to Operation Center for execution. Therefore, you must perform the Deploy operation to make any new or edited task take effect.
During deployment, if you select Start immediately after deployment, the task starts simultaneously with the deployment. Otherwise, after deployment, you must go to and manually start the task in the Operation column of the target task.
Click the Name/ID of the corresponding task in Tasks to view the detailed execution process of the task.
Step 6: Alert configuration
1. Create an alert
In the list, find the real-time full-database task, and click in the Operation column to configure alert policies for the task.

(1) Click Create Rule to configure an alert rule.
You can set Alert Reason to monitor task metrics such as Business delay, Failover, Task status, DDL Notification, and Task Resource Utilization, and set CRITICAL or WARNING alert levels based on specified thresholds.
By configuring Configure Advanced Parameters, you can control the time interval between alert messages to prevent sending too many messages at once, which can cause waste and message accumulation.
If the alert reason is set to Business delay, Task status, or Task Resource Utilization, you can also enable recovery notifications to notify recipients when the task returns to normal.
(2) Manage alert rules.
For existing alert rules, you can use the alert switch to enable or disable alert rules. You can also send alerts to different people based on the alert level.
2. View alerts
Click in the task list to expand the panel and go to the alert events page, where you can view alerts that have occurred.
Manage tasks
Edit a task
On the page, find the synchronization task you created, click More in the Operation column, and click Edit to modify the task information. The procedure is the same as for task configuration.
For tasks that are not running, you can directly modify the configuration, save it, and then deploy the task to the production environment for it to take effect.
For Running tasks, when you edit and deploy the task without selecting Start immediately after deployment, the original operation button changes to Apply Updates. You must click this button for the changes to take effect in the production environment.
After you click Apply Updates, the system performs three steps: Stop, Deploy, and Restart.
If the change involves adding new tables or switching existing tables:
Selecting a checkpoint is not supported when applying updates. After you confirm, the system performs schema migration and full initialization for the new tables. After full initialization is complete, the new tables begin incremental operations together with the original tables.
If other information is modified:
Selecting a checkpoint is supported when applying updates. After you confirm, the task continues running from the specified checkpoint. If no checkpoint is specified, the task resumes from the checkpoint at which it was last stopped.
Unmodified tables are not affected. After the update and restart, they continue running from the point at which the task was last stopped.
View a task
After you create a synchronization task, you can view the list of created synchronization tasks and the basic information of each task on the synchronization task page.

You can click Start or Stop in the Operation column to start or stop a synchronization task. In the More menu, you can edit or View the synchronization task.
For started tasks, you can view the basic execution status in Execution Overview, and click the corresponding summary area to view the execution details.

Checkpoint-based resumption
Use cases
Manually resetting the checkpoint when starting or restarting a task is mainly applicable to the following scenarios:
Task recovery and data resumption: After a task is interrupted, you may need to manually specify a checkpoint at the interruption time to ensure data resumes from the exact breakpoint.
Data troubleshooting and replay: If you find that synchronized data is missing or abnormal, you can roll back the checkpoint to a time before the issue occurred to replay and fix the problematic data.
Major task configuration changes: After making significant changes to the task configuration (such as destination table schema or column mapping), we recommend that you reset the checkpoint to start synchronization from a specific time to ensure data accuracy under the new configuration.
Instructions
Click Start, and in the dialog, select Whether to reset the site:

Do not select reset checkpoint and run directly: The task continues running from the checkpoint at which it was last stopped (the last checkpoint).
Reset checkpoint and select a time: The task starts running from the specified checkpoint. Make sure the selected time does not exceed the earliest time supported by the source Binlog.
If you receive a checkpoint error or a message indicating that the checkpoint does not exist when running a synchronization task, try the following solutions:
Reset the checkpoint: When starting the real-time synchronization task, reset the checkpoint and select the earliest available checkpoint for the source database.
Adjust the log retention period: If the database checkpoint has expired, consider adjusting the log retention period in the database, for example, to 7 days.
Data synchronization: If data has been lost, consider performing a full synchronization again, or configuring a batch synchronization task to manually synchronize the missing data.
Task operations and tuning
After the task starts, if you encounter issues such as data consumption latency, stalling, or poor performance, see Troubleshoot and tune real-time synchronization tasks for solutions.
FAQ
For frequently asked questions about real-time full-database synchronization, see FAQ about real-time full-database synchronization.
Appendix: Append log table format
Source columns flattened
Column name | Description |
sequence_id | The record ID of the incremental event. The value is unique and incrementing. |
operation_type | The operation type (I/D/U). |
execute_time | The timestamp of the data. |
before_image | Whether this is the pre-change image (Y/N). |
after_image | Whether this is the post-change image (Y/N). |
src_datasource | The data source from which the data originates. |
src_database | The database from which the data originates. |
src_table | The table from which the data originates. |
Column 1 | Actual data column 1. |
Column 2 | Actual data column 2. |
Column 3 | Actual data column 3. |
Source columns merged into JSON
Column name | Description |
sequence_id | The record ID of the incremental event. The value is unique and incrementing. |
operation_type | The operation type (I/D/U). |
execute_time | The timestamp of the data. |
before_image | Whether this is the pre-change image (Y/N). |
after_image | Whether this is the post-change image (Y/N). |
src_datasource | The data source from which the data originates. |
src_database | The database from which the data originates. |
src_table | The table from which the data originates. |
ddl_sql | When the operation is a DDL type, the DDL statement is written to this column. |
data_columns | Actual data columns merged into JSON. |
button to select Manually enter and Built-in Variable to concatenate and generate the destination table name. The supported variables include source data source name, source database name, and source table name.





button in the Target Table column to add columns.
tooltip on the interface.
