The ApsaraDB for OceanBase data source lets you read data from and write data to ApsaraDB for OceanBase. Use this data source to configure data synchronization tasks in DataWorks. This topic describes the features for synchronizing data with ApsaraDB for OceanBase.
Supported versions
ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer support the following OceanBase versions for batch read and write operations:
OceanBase 2.x
OceanBase 3.x
OceanBase 4.x
Limitations
Batch read
ApsaraDB for OceanBase supports Oracle and MySQL tenant modes. When you configure the where clause for data filtering or function columns in the column parameter, ensure that the syntax complies with the SQL constraints of the corresponding tenant mode. Otherwise, the SQL statement may fail.
You can read data from a view.
During a batch read, do not modify the data being synchronized. This prevents data quality issues, such as data duplication or loss.
If you configure the data source for Read by Partition, the account used to access the data source requires system permissions.
Batch write
The synchronization task requires at least insert into... permissions. Other permissions may be required depending on the statements you specify in the preSql and postSql parameters.
We recommend using the batch method to write data. This method sends a write request only when the number of accumulated rows reaches a predefined threshold.
ApsaraDB for OceanBase supports Oracle and MySQL tenant modes. When you configure the preSql and postSql parameters, ensure that the syntax complies with the SQL constraints of the corresponding tenant mode. Otherwise, the SQL statement may fail.
Real-time read
This feature supports only the OceanBase MySQL tenant mode.
To synchronize real-time data, you must enable the binlog feature. For more information, see Binlog-related operations (Alibaba Cloud instances), Binlog-related operations (OB Cloud instances), or .
Real-time full database synchronization tasks do not support data sources in connection string mode.
For real-time full database synchronization tasks, the database version must be V3.0 or later.
ApsaraDB for OceanBase is a distributed relational database that can integrate data from multiple physically distributed databases into a single logical database. However, when you synchronize data in real time from ApsaraDB for OceanBase to AnalyticDB for MySQL, you can only synchronize data from a single physical database. Data synchronization from a logical database is not supported.
Prerequisites
Before you synchronize data by using DataWorks, prepare your ApsaraDB for OceanBase environment. This allows you to correctly configure and run data synchronization tasks.
Configure a whitelist
Add the CIDR block of the VPC where your serverless resource group or exclusive resource group for Data Integration resides to the whitelist of your OceanBase instance. For more information, see Configure an IP address whitelist.
Create an account and grant permissions
Create a database account with the required operation permissions on the OceanBase database. For more information, see Create an account.
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.
Data synchronization task development
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Single-table batch synchronization
Supported data sources: All data source types supported by Data Integration.
Configuration guide: Configure a batch synchronization task
Single-table real-time synchronization
Supported data source: Kafka
Configuration guide: Configure a real-time full database synchronization task
Full-database real-time synchronization
Supported data source: MySQL
Configuration guide: Configure a real-time full database synchronization task
Appendix: Scripts and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script example
{
"type": "job",
"steps": [
{
"stepType": "apsaradb_for_OceanBase", // The plug-in name.
"parameter": {
"datasource": "", // The name of the data source.
"where": "",
"column": [ // The columns.
"id",
"name"
],
"splitPk": ""
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {
"print": false,
"fieldDelimiter": ","
},
"name": "Writer",
"category": "writer"
}
],
"version": "2.0",
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
},
"setting": {
"errorLimit": {
"record": "0" // The number of dirty data records allowed.
},
"speed": {
"throttle": true, // If set to false, the mbps parameter does not take effect and no throttling is applied. If set to true, throttling is enabled.
"concurrent": 1, // Concurrency.
"mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
}
}
}Reader script parameters
Parameter | Description | Required | Default |
datasource | The name of the ApsaraDB for OceanBase data source in DataWorks. You can configure the connection by using either the jdbcUrl or username parameter. | Yes | None |
jdbcUrl | The JDBC connection string for the destination database. You can specify multiple connection addresses for a database in a JSON array. If multiple addresses are configured, ApsaraDB for OceanBase Reader probes the IP addresses in sequence until it finds a valid one. If all connections fail, ApsaraDB for OceanBase Reader reports an error. Note The jdbcUrl parameter must be included in the connection configuration block. The jdbcUrl format must comply with the official ApsaraDB for OceanBase specifications and can include connection control information. Example: | No | None |
username | The username for the data source. | No | None |
password | The password for the specified username. | No | None |
table | The tables from which you want to synchronize data. You can specify multiple tables in a JSON array. If you configure multiple tables, ensure that their schemas are identical. ApsaraDB for OceanBase Reader does not verify schema consistency across tables. Note The table parameter must be included in the connection configuration block. | Yes | None |
column | The set of columns to synchronize. Specify the columns in a JSON array. By default, all columns are used. Example:
| Yes | None |
splitPk | When ApsaraDB for OceanBase Reader extracts data, if you specify the splitPk parameter, the system uses the column represented by splitPk to perform data sharding. This process initiates concurrent tasks for data synchronization and improves its efficiency.
| No | Empty |
where | ApsaraDB for OceanBase Reader constructs an SQL query based on the specified column, table, and where conditions to extract data. For example, during testing, you can set the where condition to limit 10. In a typical business scenario, you might synchronize data for the current day by setting the where condition to
| No | None |
querySql | In some business scenarios, the where parameter may not be sufficient to describe the filtering conditions. You can use this parameter to define a custom filtering SQL statement. If this parameter is configured, the data synchronization system ignores the table, column, and splitPk parameters and uses the content of this parameter to filter data. When you configure querySql, ApsaraDB for OceanBase Reader ignores the table, column, where, and splitPk parameters. | No | None |
fetchSize | The number of data records to fetch in each batch from the database server. This value determines the number of network interactions between Data Integration and the server and can significantly improve data extraction performance. Note A large fetchSize value (>2048) may cause an Out of Memory (OOM) error in the data synchronization process. | No | 1,024 |
Writer script example
{
"type":"job",
"version":"2.0",// Version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"apsaradb_for_OceanBase",// The plug-in name.
"parameter":{
"datasource": "The data source name",
"column": [// The columns.
"id",
"name"
],
"table": "apsaradb_for_OceanBase_table",// The table name.
"preSql": [ // The SQL statement to be executed before the data synchronization task runs.
"delete from @table where db_id = -1"
],
"postSql": [// The SQL statement to be executed after the data synchronization task runs.
"update @table set db_modify_time = now() where db_id = 1"
],
"obWriteMode": "insert",
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The number of dirty data records allowed.
},
"speed":{
"throttle":true,// If set to false, the mbps parameter does not take effect and no throttling is applied. If set to true, throttling is enabled.
"concurrent":1, // Concurrency.
"mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}Writer script parameters
Parameter | Description | Required | Default |
datasource | The name of the ApsaraDB for OceanBase data source in DataWorks. You can configure the connection by using either the jdbcUrl or username parameter. | No | None |
jdbcUrl | The JDBC connection string for the destination database. The jdbcUrl parameter must be included in the connection configuration block.
| Yes | None |
username | The username for the data source. | Yes | None |
password | The password for the specified username. | Yes | None |
table | The name of the destination table. Specify the name in a JSON array. Note The table parameter must be included in the connection configuration block. | Yes | None |
column | The columns in the destination table where you want to write data. Separate column names with commas. Example: Note The column parameter is required and cannot be empty. | Yes | None |
obWriteMode | The write mode for the destination table. This parameter is optional.
| No | insert |
onClauseColumns | Note This parameter is used in the Oracle tenant mode and is required when Configure this parameter with primary key columns or unique constraint columns. Separate multiple columns with commas. Example: | No | None |
obUpdateColumns | Note This parameter takes effect when The columns to update when a data write conflict occurs. Separate multiple columns with commas. Example: | No | All columns |
preSql | The SQL statement to execute before the write operation. If the SQL statement needs to operate on a table, use | No | None |
postSql | The SQL statement to execute after the write operation is complete. | No | None |
batchSize | The number of records to commit in each batch. This value can greatly reduce network interactions between the data synchronization system and the server, improving overall throughput. Note A batchSize value greater than 2048 may cause an OOM error in the data synchronization process. | No | 1,024 |