This topic answers common questions about real-time synchronization.
Overview
|
Category |
Related topics |
|
Configuration notes for real-time synchronization tasks |
|
|
FAQ for synchronizing MySQL data in real time |
|
|
FAQ for synchronizing Oracle, PolarDB, and MySQL data in real time |
A real-time synchronization task for Oracle, PolarDB, or MySQL fails repeatedly |
|
Error messages and solutions |
|
Configuration notes for real-time synchronization
Supported data sources
For a list of supported data sources for real-time synchronization, see Supported data sources for real-time synchronization.
High latency in real-time synchronization
If you query a destination table and find that new data is missing, the real-time synchronization task may have high latency. You can go to the real-time synchronization tasks page in Operation Center to check if the latency of the task is too high. For more information, see Solutions for high latency in real-time synchronization tasks.
High latency can be caused by the following reasons:
|
Symptom |
Direct cause |
Solution |
|
High read-side latency |
Excessive data changes at the source. A sudden increase in latency indicates a surge in data volume at the source at a specific time. |
If frequent updates and large data volumes at the source cause high synchronization latency, take the following actions:
|
|
The starting offset is set too early. |
If the starting offset is too early, the real-time synchronization task needs some time to catch up with the data. |
|
|
High write-side latency |
Performance or load issues with the destination database. |
If the database load is high, simply adjusting the synchronization task concurrency does not solve the problem. You need to contact your database administrator for help. |
|
High read-side and write-side latency |
Synchronization over the public internet causes network issues that lead to task latency. |
Synchronization over the public internet cannot guarantee real-time timeliness. Establish a network connection and synchronize data over the internal network. Note
Synchronization over the public internet has the following risks: network instability and frequent packet loss affect synchronization performance and security is low. |
|
A large performance difference between the source and destination databases or high database load can also cause high latency. |
If the database load is high, simply adjusting the synchronization task concurrency does not solve the problem. You need to contact your database administrator for help. |
Using the public network
Using the public internet for real-time synchronization tasks poses the following risks:
-
The network may be unstable, and frequent packet loss can affect synchronization performance.
-
Security is low.
Column format
When Data Integration synchronizes data from MySQL, Oracle, LogHub, and PolarDB to DataHub or Apache Kafka in real time, it adds five extra columns to the destination to support metadata management, record sorting, and deduplication. For more information, see Column format in real-time synchronization.
Handling TRUNCATE operations
Real-time synchronization supports TRUNCATE operations, which take effect during incremental and full merges. If you choose to ignore TRUNCATE, extraneous data may appear in the destination table.
Performance improvement
If the write speed is slow, you can increase the write-side concurrency and tune the JVM parameters. JVM parameter settings depend on the frequency of data changes, not the number of synchronized databases. If your resource group permits, allocating more memory can reduce the frequency of full GC and improve the performance of real-time synchronization. Set Concurrency to 3. In the Basic configuration panel of the task, set JVM Parameters to -Xms8192m -Xmx8192m.
Running tasks in the console
No, you cannot run real-time synchronization tasks directly from the DataWorks console. After you configure a real-time synchronization task, you must submit and deploy the real-time synchronization node, and then run the node in the production environment. For more information, see Operations and tuning for single-table real-time synchronization.
Slowdown with MySQL sources
One reason is an increase in the number of binlogs to be synchronized. Because binlogs are generated at the instance level, even if a synchronization task is configured to synchronize only Table A, changes to other tables in the instance, such as Table B and Table C, also generate binlogs. These additional binlogs slow down the synchronization process.
Memory usage for single vs. multiple databases
When you select two or more databases, the task enters full-instance real-time synchronization mode. This mode consumes more resources than two separate real-time tasks that each synchronize a single database.
Supported DDL policies
Handling methods:
|
Processing |
Ignore |
Alert |
Error |
|
The DDL message is forwarded to the destination data source for processing. The processing policy may vary depending on the destination data source. |
The DDL message is discarded, and the destination data source takes no action. |
The DDL message is discarded, and an alert is sent. Note
If no alert rule is configured for the real-time task, you will not receive the corresponding alert. |
The task terminates with an error. Note
If an alert rule corresponding to the task status is configured for the real-time task, you can receive the corresponding alert. |
DDL types:
|
Create table |
|
|
Drop table |
|
|
Add column |
|
|
Drop column |
Processing is not supported. You can only choose Ignore, Alert, or Error. |
|
Rename table |
Processing is not supported. You can only choose Ignore, Alert, or Error. |
|
Rename column |
Processing is not supported. You can only choose Ignore, Alert, or Error. |
|
Alter column type |
|
|
Truncate table |
|
Source DDL and DML considerations
-
After a new column is added at the source and processed at the destination, the following limitations apply:
-
When a column with a DEFAULT VALUE is added, the new column in the destination table remains NULL. Subsequently, when new data is added to this column at the source, the task synchronizes it to the destination.
-
When a VIRTUAL column is added, the new column in the destination table remains NULL. Subsequently, when new data is added to this column at the source, the task synchronizes it to the destination.
-
-
For real-time synchronization from MySQL and PolarDB for MySQL sources, append new columns to the end of the table instead of inserting them in the middle. If inserting a column in the middle is unavoidable at the source, the following constraints apply:
-
In a full and incremental solution, do not add a column in the middle during the full synchronization phase. This can cause data anomalies during the real-time synchronization phase.
-
During the real-time synchronization phase, the synchronization offset reset time must be set after the DDL event that adds the column in the middle. Otherwise, data anomalies will occur during subsequent synchronization.
-
-
If data anomalies occur, you can restore the data by re-initializing the affected table. You only need to re-initialize the single table where the column was added, not all tables in the task.
If the source table has default values, are the default values and NOT NULL constraints retained in the destination table created by Data Integration?
High latency after PostgreSQL failover
This is a behavior of the PostgreSQL database itself. If the latency is unacceptable, you can stop the task and restart it to perform a full and incremental data synchronization.
Running a full synchronization
In Data Integration, find your task in the synchronization task list. Then, in the Actions column, click .
MySQL real-time synchronization FAQ
Task stops reading from MySQL
-
Run the following command in the database to identify the current binlog file.
show master status -
Compare this with the binlog file read by the task. Search the logs for
journalName=mysql-bin.000001,position=50to confirm if data is being written to the database. -
If data is being written but the binlog is not advancing, contact your DBA for assistance.
Oracle, PolarDB, and MySQL FAQ
Recurring task failures for Oracle, PolarDB, and MySQL
-
Symptom: The real-time synchronization task fails repeatedly.
When the source of a real-time synchronization is an Oracle, PolarDB, or MySQL data source, processing DDL messages from the source is not supported by default. When DDL changes other than creating a new table occur, the real-time task fails and exits. With breakpoint resume, the synchronization may still fail even if no DDL messages are generated at the source.
NoteTo avoid data loss or corruption, do not use the Rename command to swap the names of two columns. For example, do not swap the names of Column A and Column B.
-
Cause: Real-time synchronization supports breakpoint resume. To ensure no data is lost, the task may backtrack to an earlier offset when it restarts. This can cause the task to read previous DDL messages again, triggering another failure.
-
Solution:
-
When a DDL change occurs at the source, manually perform the corresponding DDL change on the destination database.
-
Start the real-time synchronization task and change the DDL policy from Error to Ignore.
NoteBecause breakpoint resume may still subscribe to this DDL event, temporarily change the policy to Ignore to prevent the task from failing again.
-
Stop the real-time synchronization task, change the DDL policy from Ignore back to Error, and then restart the task.
-
Error messages and solutions
Kafka error: Startup mode for the consumer set to timestampOffset, but no begin timestamp was specified.
Reset the starting offset. In the Start dialog box for the task, find the Reset offset option and select the Reset offset checkbox to enable this feature.
The Reset offset function in Data Integration lets you change the starting point for data synchronization. Use it to restart a task from a specific time or data position, for example, to recover from an error or resynchronize specific data. This ensures data consistency and integrity.
MySQL error: Cannot replicate because the master purged required binary logs.
If a MySQL real-time synchronization task fails with the message Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup., it may be because MySQL does not have the binlog record at the consumed offset. Check the binlog retention period in your MySQL settings and ensure that the offset is configured within this time range when the synchronization task starts.
If you cannot subscribe to the binlog, try resetting the offset to the current time.
MySQL error: MySQLBinlogReaderException
If a MySQL real-time synchronization task fails with the message MySQLBinlogReaderException: The database you are currently syncing is the standby database, but the current value of log_slave_updates is OFF, you need to enable the binlog log update of the standby database first. , it may be because binlog is not enabled on the standby database. If you want to synchronize a standby database, you must enable binlog for cascading standby databases. You can ask your DBA for help.
For more information about how to enable binlog, see Step 3: Enable binlog for MySQL.
MySQL error: show master status' has an error!
If a MySQL real-time synchronization task fails with the message show master status' has an error! and the error details are Caused by: java.io.IOException: message=Access denied; you need (at least one of) the SUPER, REPLICATION CLIENT privilege(s) for this operation, with command: show master status, it may be because the data source account lacks the required database permissions.
The account configured for the data source must have the SELECT, REPLICATION SLAVE, and REPLICATION CLIENT permissions for the database. For more information about how to grant permissions to the data source account, see Step 2: Create an account and grant permissions.
MySQL error: parse.exception.PositionNotFoundException: can't find start position forxxx
The synchronization process could not find the offset. Please reset the offset.
MySQL error: The database offset has expired. Please select a new offset. The earliest available offset in the source database is xxx.
-
Reset the offset: When starting the task, enable the reset offset option and select the earliest available offset from the source database.
-
Adjust the binlog retention period: If the database offset has expired, consider adjusting the binlog retention period in the MySQL database, for example, to 7 days.
-
Data synchronization: If you have lost data, consider performing a full resynchronization or configuring an offline synchronization task to backfill the missing data.
PolarDB error: The mysql server does not enable the binlog write function. Please enable the mysql binlog write function first
-
Possible cause: The loose_polar_log_bin parameter is not enabled for the source PolarDB data source.
-
Solution: You need to enable the loose_polar_log_bin parameter. For more information, see Configure a PolarDB data source.
Hologres error: permission denied for database xxx
When you synchronize data to Hologres in real time, you need to grant the current user admin permissions for the Hologres instance. The user must have permission to create schemas. For more information, see Hologres permission model.
MaxCompute error: ODPS-0410051:invalid credentials-accessKeyid not found
When you synchronize data to a MaxCompute data source in real time by using a temporary AccessKey, the temporary AccessKey automatically expires after 7 days, which causes the task to fail. The platform automatically restarts the task when it detects a failure caused by an expired temporary AccessKey. If you have configured monitoring alerts for this type of failure, you will receive an alert.
Oracle error: logminer doesn't init, send HeartbeatRecord
When an Oracle real-time synchronization task initializes, it loads a previous archive log to find a suitable synchronization offset. If the archive log is large, it may take 3 to 5 minutes to complete the initialization phase.