Synchronize data from an OceanBase database to DataHub
This topic describes how to synchronize data from an OceanBase database (MySQL and Oracle tenants) to DataHub.
Prerequisites
Data Transmission Service (DTS) requires the necessary cloud resource access permissions. For more information, see Grant permissions to a DTS role.
A database user for the data synchronization task has been created in the source OceanBase database and granted the required permissions. For more information, see Create a database user.
Limits
DTS supports full synchronization only for tables that have unique keys.
DDL synchronization is supported only for topics of the BLOB type.
During data synchronization, DTS lets you run
DROP TABLEfollowed byCREATE TABLEon a synchronized table. However, DTS does not support renaming tables, such as with theRENAME TABLE a TO a_tmpcommand.DTS supports data synchronization for the UTF-8 and GBK character sets.
The names of the tables and columns to be synchronized cannot contain Chinese characters.
DTS only supports objects whose database, table, and column names consist of ASCII characters. These names cannot contain special characters, such as line breaks, spaces, or any of the following: .|"'`()=;/&\.
DTS does not support using a secondary OceanBase database as the source.
DataHub has the following limits:
The maximum size of a single message in DataHub is typically 1 MB.
Because messages are sent in batches, DataHub limits the size of each batch to 4 MB.
For more information about the limits and naming conventions of DataHub, see Limits.
Notes
If you perform a
RENAMEoperation on a source table in a data synchronization task with DDL synchronization enabled, we recommend restarting the task to prevent data loss during incremental synchronization.If you select incremental synchronization for an OceanBase database version from V4.0.0 to V4.3.x (excluding V4.2.5 BP1), you must configure the STORED property for generated columns. Otherwise, information about generated columns will not be saved in incremental logs, which can cause data exceptions during incremental synchronization.
If an updated row contains LOB columns:
If a LOB column is updated, its value before the
UPDATEorDELETEoperation is unreliable.Data types that use LOB columns for storage include JSON, GIS, XML, User Defined Type (UDT), and various TEXT types, such as LONGTEXT and MEDIUMTEXT.
If a LOB column is not updated, its value is NULL both before and after the
UPDATEorDELETEoperation.
If you configure only Incremental Synchronization, the incremental logs in the source database must be retained for more than 48 hours.
If you configure Full Synchronization and Incremental Synchronization, the incremental logs in the source database must be retained for at least 7 days. Otherwise, DTS may be unable to retrieve the incremental logs, which can cause the sync task to fail and result in data inconsistency between the source and destination.
When you synchronize incremental data from an OceanBase database to DataHub, the initial table schema is synchronized to the DataHub schema. The following table describes the supported DataHub data types. This applies only to topics of the Tuple type.
Type
Description
Range
BIGINT
8-byte signed integer
-9223372036854775807 to 9223372036854775807
DOUBLE
8-byte double-precision floating-point number
-1.0_10^308 to 1.0_10^308
BOOLEAN
Boolean type
True/False
true/false
0/1
TIMESTAMP
Timestamp type
A timestamp that is accurate to the microsecond
STRING
String that supports only UTF-8 encoding
The maximum size of a single STRING column is 2 MB
INTEGER
4-byte integer
-2147483648 to 2147483647
FLOAT
4-byte single-precision floating-point number
-3.40292347_10^38 to 3.40292347_10^38
DECIMAL
Numeric type
-10^38 +1 to 10^38 - 1
Supported source and destination instance types
In the following table, an OceanBase database with a MySQL tenant is referred to as OB_MySQL, and an OceanBase database with an Oracle tenant is referred to as OB_Oracle.
Source | Destination |
OB_MySQL (OceanBase cluster instance) | DataHub (Alibaba Cloud DataHub instance) |
OB_MySQL (OceanBase cluster instance) | DataHub (Public DataHub instance) |
OB_MySQL (Serverless instance) | DataHub (Alibaba Cloud DataHub instance) |
OB_MySQL (Serverless instance) | DataHub (Public DataHub instance) |
OB_Oracle (OceanBase cluster instance) | DataHub (Alibaba Cloud DataHub instance) |
OB_Oracle (OceanBase cluster instance) | DataHub (Public DataHub instance) |
Supported DDL operations
If the source is an OceanBase database with a MySQL tenant and you select a Tuple topic, DDL synchronization only supports
ALTER ADD COLUMNand ignores the NOT NULL constraint.If the source is an OceanBase database with an Oracle tenant, DDL synchronization is supported only for BLOB topics.
Modify table
ALTER TABLEAdd column
ADD COLUMNModify column length
MODIFY COLUMNDrop column
DROP COLUMN
Create index
CREATE INDEXDrop index
DROP INDEXTruncate table
TRUNCATENoteIn a delayed deletion scenario, a transaction may contain two identical
TRUNCATE TABLEDDL statements. In this case, the downstream consumer must process the statements in an idempotent manner.
Data type mapping
DataHub supports only the INTEGER, BIGINT, TIMESTAMP, FLOAT, DOUBLE, DECIMAL, STRING, and BOOLEAN data types for synchronization.
If you create a new topic of an unsupported type, the data synchronization task will fail.
The default mapping rules in the tables are recommended. Errors may occur if you change the mapping.
Data type mapping between an OceanBase database with a MySQL tenant and DataHub
When you synchronize data from a MySQL tenant of an OceanBase database to DataHub, you can select BLOB or Tuple as the topic type. The following tables describe the data type mapping when the topic type is Tuple and the serialization format is Default or DTSCompatible. For more information about the data type mapping when the topic type is BLOB, see Data formats.
When you synchronize data from a MySQL tenant of an OceanBase database to DataHub (Tuple type) and set the serialization format to Default, the data types are mapped as follows.
OceanBase database with a MySQL tenant
Default DataHub type
BIT
STRING (Base64 encoded)
CHAR
STRING
BINARY
STRING (Base64 encoded)
VARBINARY
STRING (Base64 encoded)
INT
BIGINT
TINYTEXT
STRING
SMALLINT
BIGINT
MEDIUMINT
BIGINT
BIGINT
DECIMAL (Because the unsigned value exceeds the range of the LONG type in Java)
FLOAT
DECIMAL
DOUBLE
DECIMAL
DECIMAL
DECIMAL
DATE
STRING
TIME
STRING
YEAR
BIGINT
DATETIME
STRING
TIMESTAMP
TIMESTAMP (Displayed as a millisecond-level timestamp)
VARCHAR
STRING
TINYBLOB
STRING (Base64 encoded)
TINYTEXT
STRING
BLOB
STRING (Base64 encoded)
TEXT
STRING
MEDIUMBLOB
STRING (Base64 encoded)
MEDIUMTEXT
STRING
LONGBLOB
STRING (Base64 encoded)
LONGTEXT
STRING
When you synchronize data from a MySQL tenant in an OceanBase database to DataHub (Tuple type) and set the serialization format to DTSCompatible, the data types are mapped as follows.
OceanBase database with a MySQL tenant
Default DataHub type
CHAR
STRING
VARCHAR
STRING
BINARY
STRING
VARBINARY
STRING
BIT(1)
The value of BIT can be 0 or 1.
BOOLEAN
If the value of the BIT column is 0, the BOOLEAN value in DataHub is
false.If the value of the BIT column is 1, the BOOLEAN value in DataHub is
true.
BIT(n)
STRING (HEX encoded)
The number of bits displayed for the `BIT` data type may not match the number of bits defined in the database. For example, a `bit(10)` value of `0` may be displayed as `000` by DTS, but as `00` in the database.
TINYINT
BIGINT
SMALLINT
BIGINT
MEDIUMINT
BIGINT
INT
BIGINT
BIGINT
BIGINT
The BIGINT type in DataHub is signed and has a range of -9223372036854775807 to 9223372036854775807. If the source data is an unsigned BIGINT and its value exceeds 9223372036854775807, a data overflow occurs in DataHub, which leads to data inconsistency.
FLOAT
DOUBLE
The precision is 7 digits. If the precision exceeds 7 digits, inconsistencies may occur.
DOUBLE
DOUBLE
The precision is 16 digits. If the precision exceeds 16 digits, inconsistencies may occur.
DECIMAL
DECIMAL
DATE
TIMESTAMP (converted using the +08:00 time zone, accurate to the microsecond)
TIME
STRING
YEAR
STRING
DATETIME
TIMESTAMP (converted using the +08:00 time zone, accurate to the microsecond)
TIMESTAMP
TIMESTAMP (accurate to the microsecond)
TINYTEXT
STRING
MEDIUMTEXT
STRING
TEXT
STRING
LONGTEXT
STRING
TINYBLOB
STRING (HEX encoded)
MEDIUMBLOB
STRING (HEX encoded)
BLOB
STRING (HEX encoded)
LONGBLOB
STRING (HEX encoded)
Data type mapping between an OceanBase database with an Oracle tenant and DataHub
OceanBase database with an Oracle tenant | Default DataHub type |
CHAR | STRING |
NCHAR | STRING |
VARCHAR2 | STRING |
NVARCHAR2 | STRING |
CLOB | STRING |
BLOB | STRING (Base64 encoded) |
NUMBER | DECIMAL |
BINARY_FLOAT | DECIMAL |
BINARY_DOUBLE | DECIMAL |
DATE | STRING |
TIMESTAMP | STRING |
TIMESTAMP WITH TIME ZONE | STRING |
TIMESTAMP WITH LOCAL TIME ZONE | STRING |
INTERVAL YEAR TO MONTH | STRING |
INTERVAL DAY TO SECOND | STRING |
RAW | STRING (Base64 encoded) |
Additional properties
If you create a topic manually, you must add the following properties to the DataHub schema before you start the sync task. If you allow DTS to automatically create the topic and synchronize the schema, DTS adds these properties for you.
This section applies to Tuple-type topics when the Serialization Method is set to Default on the Sync Options > Advanced Options page.
Name | Type | Description |
oms_timestamp | STRING | The time when the change occurred. |
oms_table_name | STRING | The name of the changed table if the source is a table. |
oms_database_name | STRING | If the source is a database, change its name. |
oms_sequence | STRING | The timestamp when the data enters the memory of the sync process. It consists of a time value and a 5-digit incremental number. If a clock backoff occurs, data inconsistency may result. |
oms_record_type | STRING | The type of change, which can be |
oms_is_before | STRING | If the change type is |
oms_is_after | STRING | If the change type is |
If the source is an OceanBase database with a MySQL tenant, setting the topic type to Tuple on the Choose Synchronization Objects page and the Serialization Format to DTSCompatible on the Advanced Options tab of the Sync Options page adds the dts_ suffix to all column names. For example, if the columnName in the source OceanBase database with a MySQL tenant is c1, it becomes dts_c1 in the destination DataHub instance. The field conversions are described as follows.
Field name | Data type | Description |
dts_record_id | STRING | The record ID in the incremental log. This ID is the unique identifier of the log. The record ID usually increments. However, during disaster recovery and migration, the clocks on different machines may not be synchronized, which can cause inconsistencies. |
dts_operation_flag | STRING | The operation type. Valid values:
|
dts_db_name | STRING | The name of the database. For an OceanBase database, the format is |
dts_table_name | STRING | The name of the table. |
dts_utc_timestamp | STRING | The timestamp of the operation, which is the timestamp of the binary log (UTC time). |
dts_before_flag | STRING | Indicates whether the values of all columns are the values before the update. Valid values are Y and N. |
dts_after_flag | STRING | Indicates whether the values of all columns are the values after the update. Valid values are Y and N. |
Procedure
Log on to the OceanBase Management Console and purchase a data sync task.
For more information, see Purchase a data sync task.
In the Data Transmission Service console, navigate to the Data Synchronization page, find the new sync task, and click Configure.

If you want to reuse the configurations of an existing task, click Reuse Configuration. For more information, see Reuse and clear the configurations of a data sync task.
On the Choose Source and Destination page, you can configure the parameters.
Parameter
Description
Sync Task Name
We recommend that you use a combination of Chinese characters, digits, and letters. The name cannot contain spaces and must be no more than 64 characters in length.
Source
If you have created an OceanBase data source, select it from the drop-down list. If not, click Create Data Source in the drop-down list to create one in the dialog box that appears. For more information about the parameters, see Create an OceanBase data source.
Destination
If you have created a DataHub data source, select it from the drop-down list. If not, click Create Data Source in the drop-down list to create one in the dialog box that appears. For more information about the parameters, see Create a DataHub data source.
Tags (Optional)
Click the text box and select a tag from the drop-down list. You can also click Manage Tags to create, modify, and delete tags. For more information, see Manage data sync tasks using tags.
Click Next. On the Choose Synchronization Types page, select the required synchronization types.

The synchronization types are Schema Synchronization, Full Synchronization, and Incremental Synchronization. Schema Synchronization creates a topic, and Incremental Synchronization includes DML Synchronization (
INSERT,DELETE, andUPDATE) and DDL Synchronization. You can configure these types as needed. For more information, see Customize DDL and DML synchronization.NoteIf you select DDL Synchronization when the source is an OceanBase database with an Oracle tenant, you can only select BLOB as the topic type on the Choose Synchronization Objects page.

Click Next. On the Choose Synchronization Objects page, select the topic type and the objects that you want to synchronize.
The topic types include Tuple and BLOB. If the source is an OceanBase database with an Oracle tenant, topics of the Tuple type do not support DDL synchronization but do support data similar to database records, where each record contains multiple columns. Topics of the BLOB type support writing a single block of binary data as a record, which is transmitted using Base64 encoding. For more information, see the official DataHub documentation.
After you select a topic type, you can select synchronization objects using Specify Objects or Matching Rules. This topic describes how to use Specify Objects. For more information about how to configure matching rules, see Configure and modify matching rules.
NoteIf you select the DDL Synchronization check box in the Choose Synchronization Types step, we recommend that you select synchronization objects using matching rules to ensure that all new objects that meet the rules are synchronized. If you select specific synchronization objects, new or renamed objects are not synchronized.
In the Choose Synchronization Objects section, select Specify Objects.
In the selection area on the left, select the objects to synchronize.
Click >.
Select a mapping method as needed.
To synchronize a single Tuple table or a BLOB table, select a mapping method in the Map Object to Topic dialog box and click OK.

When the Schema Synchronization check box is cleared, you can only select Existing Topic. When the Schema Synchronization check box is selected, you can only use one mapping method to create or select a topic.
For example, if you select schema synchronization and then use both the Create Topic and Select Existing Topic mapping methods, or if you rename the topic, a precheck error will occur because of conflicting options.
Parameter
Description
Create Topic
Enter a name for the new topic in the text box. The name can contain up to 128 letters, digits, or underscores (_), and must start with a letter.
Select Topic
DTS lets you query DataHub topics. You can click Select Topic, and then search for and select the topic to synchronize from the Existing Topic drop-down list.
Batch Create Topics
The rule for batch creating topics is
Topic_${Database Name}_${Table Name}.If you select Create Topic or Batch Create Topics, you can query the new topic in DataHub after the schema synchronization is complete. By default, the new topic has 2 shards and a time-to-live (TTL) of 7 days. These values cannot be changed.
To synchronize multiple tables of the Tuple type, click OK in the dialog box.

If you select multiple tables and a topic of the Tuple type without selecting the Schema Synchronization check box, you must select an existing topic in the Map Object to Topic dialog box and click OK.

In this case, multiple tables are displayed for one topic on the right, but only one table can be synchronized. If you click Next, a message is displayed indicating that Tuple-type topics support only a one-to-one mapping with tables.

DTS lets you import objects from a text file and supports operations such as renaming destination objects, setting row filters, and removing single or all objects. The structure of a destination object is Topic > Database > Table.
NoteWhen you select synchronization objects using Matching Rules, the renaming function is part of the matching rule syntax. You can only set filter conditions and select the shard column and the columns to synchronize. For more information, see Configure and modify matching rules.

Operation
Description
Import Objects
In the list on the right of the selection area, click Import Objects in the upper-right corner.
In the dialog box, click OK.
ImportantThe import operation overwrites previous selections. Proceed with caution.
In the Import Synchronization Objects dialog box, import the objects to synchronize. You can import a CSV file to rename database tables, set row filtering conditions, and perform other operations. For more information, see Download and import synchronization object configurations.
Click Validate.
After the validation is successful, click OK.
Change Topic
When you select BLOB as the topic type, you can change the topic for the destination object. For more information, see Change a topic.
Settings
DTS supports row filtering using a
WHEREcondition, along with selecting a shard column and the columns to synchronize.In the list on the right of the selection area, move the mouse pointer over the destination table object.
Click the Settings icon that appears.
In the Settings dialog box, you can perform the following operations.
In the text box in the Row Filtering Condition section, enter a standard
WHEREclause of an SQL statement to configure row filtering. For more information, see Filter data using SQL conditions.From the Shard Column drop-down list, select the destination shard column. You can select multiple fields as shard columns. This parameter is optional.
When you select a shard column, you can select the primary key by default. If a load imbalance exists on the primary key, select a field that has a unique identifier and a relatively balanced load as the shard column. This avoids potential performance issues. The main functions of a shard column are as follows:
Load balancing: If concurrent writes are allowed at the destination, the shard column is used to determine the specific thread for sending messages.
Ordering: To address the disorder that may be caused by concurrent writes, DTS ensures that messages are received in order when the values of the shard column are the same. Ordering here refers to the order of changes (the execution order of DML operations on a column).
In the Select Columns section, select the columns to synchronize. For more information, see Filter columns.
Click OK.
Remove/Remove All
During data mapping, DTS lets you remove one or more objects that are temporarily selected for the destination.
Remove a single synchronization object
In the list on the right of the selection area, move the mouse pointer over the destination object and click the Remove icon that appears to remove the synchronization object.
Remove all synchronization objects
In the list on the right of the selection area, click Remove All in the upper-right corner. In the dialog box that appears, click OK to remove all synchronization objects.
Click Next and configure the parameters on the Sync Options page.
Full synchronization
The following parameters are displayed only if you select Full Synchronization on the Choose Synchronization Types page.

Parameter
Description
Read Concurrency
This parameter specifies the number of concurrent threads for reading data from the source during full synchronization. The maximum value is 512. A high concurrency may put excessive pressure on the source and affect your business.
Write Concurrency
This parameter specifies the number of concurrent threads for writing data to the destination during full synchronization. The maximum value is 512. A high concurrency may put excessive pressure on the destination and affect your business.
Full Synchronization Rate Limit
You can decide whether to enable the full synchronization rate limit as needed. If you enable it, set RPS (the maximum number of data rows that can be synchronized to the destination per second during full synchronization) and BPS (the maximum data volume that can be synchronized to the destination per second during full synchronization).
NoteThe RPS and BPS values set here are only for rate limiting. The actual performance of full synchronization is affected by factors such as the source, destination, and instance specifications.
Incremental synchronization
The following parameters are displayed only if you select Incremental Synchronization on the Choose Synchronization Types page.

Parameter
Description
Write Concurrency
This parameter specifies the number of concurrent threads for writing data to the destination during incremental synchronization. The maximum value is 512. A high concurrency may put excessive pressure on the destination and affect your business.
Incremental Synchronization Rate Limit
You can decide whether to enable the incremental synchronization rate limit as needed. If you enable it, set RPS (the maximum number of data rows that can be synchronized to the destination per second during incremental synchronization) and BPS (the maximum data volume that can be synchronized to the destination per second during incremental synchronization).
NoteThe RPS and BPS values set here are only for rate limiting. The actual performance of incremental synchronization is affected by factors such as the source, destination, and instance specifications.
Incremental Synchronization Start Offset
If you select Full Synchronization when you select the synchronization type, you cannot change this parameter.
If you do not select Full Synchronization but select Incremental Synchronization when you select the synchronization type, specify a point in time after which you want to synchronize data. The default value is the current system time. For more information, see Set the start offset for incremental synchronization.
Advanced options

Parameter
Description
Serialization Format
Controls the message format for data synchronization to DataHub. The supported formats are Default, Canal, DataWorks (V2.0 is supported), SharePlex, DefaultExtendColumnType, Debezium, DebeziumFlatten, DebeziumSmt, and DTSCompatible. For more information, see Serialization format descriptions.
ImportantThis parameter is displayed only when you select BLOB as the topic type on the Choose Synchronization Objects page, or when the source is an OceanBase database with a MySQL tenant and you select Tuple as the topic type.
Currently, only OceanBase databases with a MySQL tenant support Debezium, DebeziumFlatten, DebeziumSmt, and DTSCompatible.
When the source is an OceanBase database with a MySQL tenant and you select Tuple as the topic type on the Choose Synchronization Objects page, the supported serialization formats are Default and DTSCompatible.
Partitioning Rule
The rule for synchronizing source data to a DataHub topic. The supported rules are Hash and Table. We recommend that you select Table to ensure that DDL and DML consumption is consistent for the downstream consumer.
Hash indicates that DTS uses a hash algorithm to select a DataHub shard based on the primary key value or shard column value.
Table indicates that DTS ships all data from one table to the same partition, using the table name as the hash key.
NoteIf you select DDL Synchronization on the Choose Synchronization Types page, only the Table partitioning rule is supported.
Business System ID (Optional)
This parameter is displayed only when you set Serialization Format to DataWorks. It is used to identify the source business system of the data for custom processing. The business system ID can be 1 to 20 characters in length.
Click Precheck.
In the Precheck step, DTS checks only the column names, column types, and whether the columns are empty. It does not check the length or default values. If a precheck error occurs:
You can troubleshoot the issue and run the precheck again until all items pass.
You can also click Skip in the Actions column for a failed precheck item. A dialog box appears that explains the impact of skipping this operation. If you want to proceed, click OK in the dialog box.
After the precheck passes, click Start Task.
If you do not need to start the task right away, click Save. You can then start the task manually on the Sync Task List page or include it in a batch operation. For more information about batch operations, see Perform batch operations on data sync tasks.
DTS lets you modify the synchronization objects while a data sync task is running. For more information, see View and modify synchronization objects and their filter conditions. After the data sync task starts, it runs based on the selected synchronization types. For more information, see View synchronization details.
If a data sync task fails, typically due to network connection issues or a slow process startup, you can click Resume on the task list or the product page.





