ApsaraDB for OceanBase

更新时间:
复制 MD 格式

The ApsaraDB for OceanBase data source lets you read data from and write data to ApsaraDB for OceanBase. Use this data source to configure data synchronization tasks in DataWorks. This topic describes the features for synchronizing data with ApsaraDB for OceanBase.

Supported versions

ApsaraDB for OceanBase Reader and ApsaraDB for OceanBase Writer support the following OceanBase versions for batch read and write operations:

  • OceanBase 2.x

  • OceanBase 3.x

  • OceanBase 4.x

Limitations

Batch read

  • ApsaraDB for OceanBase supports Oracle and MySQL tenant modes. When you configure the where clause for data filtering or function columns in the column parameter, ensure that the syntax complies with the SQL constraints of the corresponding tenant mode. Otherwise, the SQL statement may fail.

  • You can read data from a view.

  • During a batch read, do not modify the data being synchronized. This prevents data quality issues, such as data duplication or loss.

  • If you configure the data source for Read by Partition, the account used to access the data source requires system permissions.

Batch write

Note

The synchronization task requires at least insert into... permissions. Other permissions may be required depending on the statements you specify in the preSql and postSql parameters.

  • We recommend using the batch method to write data. This method sends a write request only when the number of accumulated rows reaches a predefined threshold.

  • ApsaraDB for OceanBase supports Oracle and MySQL tenant modes. When you configure the preSql and postSql parameters, ensure that the syntax complies with the SQL constraints of the corresponding tenant mode. Otherwise, the SQL statement may fail.

Real-time read

  • This feature supports only the OceanBase MySQL tenant mode.

  • To synchronize real-time data, you must enable the binlog feature. For more information, see Binlog-related operations (Alibaba Cloud instances), Binlog-related operations (OB Cloud instances), or .

  • Real-time full database synchronization tasks do not support data sources in connection string mode.

  • For real-time full database synchronization tasks, the database version must be V3.0 or later.

  • ApsaraDB for OceanBase is a distributed relational database that can integrate data from multiple physically distributed databases into a single logical database. However, when you synchronize data in real time from ApsaraDB for OceanBase to AnalyticDB for MySQL, you can only synchronize data from a single physical database. Data synchronization from a logical database is not supported.

Prerequisites

Before you synchronize data by using DataWorks, prepare your ApsaraDB for OceanBase environment. This allows you to correctly configure and run data synchronization tasks.

Configure a whitelist

Add the CIDR block of the VPC where your serverless resource group or exclusive resource group for Data Integration resides to the whitelist of your OceanBase instance. For more information, see Configure an IP address whitelist.

Create an account and grant permissions

Create a database account with the required operation permissions on the OceanBase database. For more information, see Create an account.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Data synchronization task development

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Single-table batch synchronization

Single-table real-time synchronization

Full-database real-time synchronization

Appendix: Scripts and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script example

{
    "type": "job",
    "steps": [
        {
            "stepType": "apsaradb_for_OceanBase", // The plug-in name.
            "parameter": {
                "datasource": "", // The name of the data source.
                "where": "",
                "column": [ // The columns.
                    "id",
                    "name"
                ],
                "splitPk": ""
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {
                "print": false,
                "fieldDelimiter": ","
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "version": "2.0",
    "order": {
        "hops": [
            {
                "from": "Reader",
                "to": "Writer"
            }
        ]
    },
    "setting": {
        "errorLimit": {
            "record": "0" // The number of dirty data records allowed.
        },
        "speed": {
            "throttle": true, // If set to false, the mbps parameter does not take effect and no throttling is applied. If set to true, throttling is enabled.
            "concurrent": 1, // Concurrency.
            "mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
        }
    }
}

Reader script parameters

Parameter

Description

Required

Default

datasource

The name of the ApsaraDB for OceanBase data source in DataWorks.

You can configure the connection by using either the jdbcUrl or username parameter.

Yes

None

jdbcUrl

The JDBC connection string for the destination database. You can specify multiple connection addresses for a database in a JSON array.

If multiple addresses are configured, ApsaraDB for OceanBase Reader probes the IP addresses in sequence until it finds a valid one.

If all connections fail, ApsaraDB for OceanBase Reader reports an error.

Note

The jdbcUrl parameter must be included in the connection configuration block.

The jdbcUrl format must comply with the official ApsaraDB for OceanBase specifications and can include connection control information. Example: jdbc:oceanbase://127.0.0.1:3306/database. You must specify either this parameter or the username parameter.

No

None

username

The username for the data source.

No

None

password

The password for the specified username.

No

None

table

The tables from which you want to synchronize data. You can specify multiple tables in a JSON array.

If you configure multiple tables, ensure that their schemas are identical. ApsaraDB for OceanBase Reader does not verify schema consistency across tables.

Note

The table parameter must be included in the connection configuration block.

Yes

None

column

The set of columns to synchronize. Specify the columns in a JSON array. By default, all columns are used. Example: ["*"].

  • Column projection: You can export a subset of columns.

  • Column reordering: You can export columns in an order different from the table schema.

  • Constant configuration: You can use constants. Example: '123'.

  • Function columns: You can use function columns. Example: date('now').

  • The column parameter must explicitly specify the set of columns to synchronize and cannot be empty.

Yes

None

splitPk

When ApsaraDB for OceanBase Reader extracts data, if you specify the splitPk parameter, the system uses the column represented by splitPk to perform data sharding. This process initiates concurrent tasks for data synchronization and improves its efficiency.

  • We recommend that you use the primary key of the table for splitPk because a primary key is usually evenly distributed, which helps prevent data hotspots in the shards.

  • Currently, splitPk only supports sharding for integer data types. Other types such as strings, floats, and dates are not supported. If you specify an unsupported type, ApsaraDB for OceanBase Reader reports an error.

  • If you leave splitPk empty, the system assumes that sharding is not allowed for the table and uses a single channel for data extraction.

No

Empty

where

ApsaraDB for OceanBase Reader constructs an SQL query based on the specified column, table, and where conditions to extract data.

For example, during testing, you can set the where condition to limit 10. In a typical business scenario, you might synchronize data for the current day by setting the where condition to gmt_create>$bizdate.

  • You can use the where condition for effective incremental synchronization.

  • If the where condition is not configured or is left empty, the system performs a full table synchronization.

No

None

querySql

In some business scenarios, the where parameter may not be sufficient to describe the filtering conditions. You can use this parameter to define a custom filtering SQL statement. If this parameter is configured, the data synchronization system ignores the table, column, and splitPk parameters and uses the content of this parameter to filter data.

When you configure querySql, ApsaraDB for OceanBase Reader ignores the table, column, where, and splitPk parameters.

No

None

fetchSize

The number of data records to fetch in each batch from the database server. This value determines the number of network interactions between Data Integration and the server and can significantly improve data extraction performance.

Note

A large fetchSize value (>2048) may cause an Out of Memory (OOM) error in the data synchronization process.

No

1,024

Writer script example

{
    "type":"job",
    "version":"2.0",// Version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"apsaradb_for_OceanBase",// The plug-in name.
            "parameter":{
                "datasource": "The data source name",
                "column": [// The columns.
                    "id",
                    "name"
                ],
                "table": "apsaradb_for_OceanBase_table",// The table name.
                "preSql": [ // The SQL statement to be executed before the data synchronization task runs.
                    "delete from @table where db_id = -1"
                ],
                "postSql": [// The SQL statement to be executed after the data synchronization task runs.
                    "update @table set db_modify_time = now() where db_id = 1"
                ],
                "obWriteMode": "insert",
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The number of dirty data records allowed.
        },
        "speed":{
            "throttle":true,// If set to false, the mbps parameter does not take effect and no throttling is applied. If set to true, throttling is enabled.
            "concurrent":1, // Concurrency.
            "mbps":"12"// Throttling rate. 1 mbps = 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default

datasource

The name of the ApsaraDB for OceanBase data source in DataWorks.

You can configure the connection by using either the jdbcUrl or username parameter.

No

None

jdbcUrl

The JDBC connection string for the destination database. The jdbcUrl parameter must be included in the connection configuration block.

  • You can configure only one value for a database. Writing data to a database with multiple primary nodes (dual-primary import scenario) is not supported.

  • The format of jdbcUrl must comply with the official ApsaraDB for OceanBase specifications and can include additional connection parameters. Example: jdbc:oceanbase://127.0.0.1:3306/database.

Yes

None

username

The username for the data source.

Yes

None

password

The password for the specified username.

Yes

None

table

The name of the destination table. Specify the name in a JSON array.

Note

The table parameter must be included in the connection configuration block.

Yes

None

column

The columns in the destination table where you want to write data. Separate column names with commas. Example: "column": ["id", "name", "age"].

Note

The column parameter is required and cannot be empty.

Yes

None

obWriteMode

The write mode for the destination table. This parameter is optional.

  • insert: Uses the INSERT INTO ... statement. If a primary key or unique index conflict occurs, the conflicting row cannot be written.

  • update: Uses the ... ON DUPLICATE KEY UPDATE ... statement. This mode is used for the MySQL tenant mode. When a conflict occurs, the conflicting row is updated.

  • merge: Uses the MERGE INTO ... WHEN MATCHED THEN UPDATE ... statement. This mode is used for the Oracle tenant mode. When a conflict occurs, the conflicting row is updated.

No

insert

onClauseColumns

Note

This parameter is used in the Oracle tenant mode and is required when obWriteMode is set to merge. If it is not configured, data is written in insert mode.

Configure this parameter with primary key columns or unique constraint columns. Separate multiple columns with commas. Example: ID,C1.

No

None

obUpdateColumns

Note

This parameter takes effect when obWriteMode is set to merge or update.

The columns to update when a data write conflict occurs. Separate multiple columns with commas. Example: c2,c3.

No

All columns

preSql

The SQL statement to execute before the write operation. If the SQL statement needs to operate on a table, use @table as a placeholder, which will be replaced with the actual table name during execution.

No

None

postSql

The SQL statement to execute after the write operation is complete.

No

None

batchSize

The number of records to commit in each batch. This value can greatly reduce network interactions between the data synchronization system and the server, improving overall throughput.

Note

A batchSize value greater than 2048 may cause an OOM error in the data synchronization process.

No

1,024