Sync Oracle data

更新时间: 2026-01-20 21:41:20

You can use the Tapdata Cloud visualization interface to sync data from an Oracle database to Tablestore in real time.

Prerequisites

Before you sync data from an Oracle database to Tablestore using Tapdata Cloud, you must complete the following preparations.

Background information

Tapdata Cloud is a real-time data service from Tapdata that integrates data replication and data development. It provides real-time data synchronization with millisecond-level latency and data fusion services for cross-cloud, cross-region, and disparate data source scenarios. For more information, see What is Tapdata.

Notes

  • Data synchronization is supported only for Oracle 9i, 10g, 11g, 12c, and 19c.

  • If the `connect_time` parameter is set for the Oracle database, real-time synchronization may fail.

    Run the following command to check the `connect_time` parameter settings. Replace username with the database account name.

    select resource_name, limit from dba_profiles where profile=( select profile from dba_users where username = 'username');
  • You must reserve enough storage space for archived logs. This prevents the storage from becoming full and affecting Oracle database operations.

Procedure

To sync data from an Oracle database to Tablestore using Tapdata Cloud, perform the following steps:

  1. Configure the Oracle database and an authorized account. For more information, see Step 1: Configure the Oracle database and an authorized account.

  2. Deploy Tapdata Agent on the machine where the Oracle database resides or on a machine in the same network. For more information, see Step 2: Deploy Tapdata Agent.

    Tapdata Agent (Agent) is a key program for data synchronization, data transformation for disparate data sources, and data development scenarios. It uses stream processing to retrieve data from the source, process and transform the data, and send it to the destination.

    Important

    The Agent uses stream processing to retrieve data from the source, process and transform the data, and send it to the destination. Your data does not pass through Tapdata Cloud and is not uploaded or stored.

  3. Use Tapdata Cloud to connect to the source and destination databases. For more information, see Step 3: Connect to the source and destination databases.

  4. Create a data replication task to replicate data from the source database to the destination database. For more information, see Step 4: Configure a data replication task.

Step 1: Configure the Oracle database and an authorized account

Before you start data synchronization, you must enable ARCHIVELOG mode, supplemental logging, identification key logging, and full supplemental logging for the Oracle database. Then, you must create an account for the data sync task.

  1. Log on to the Oracle database as a user with DBA permissions.

  2. Enable ARCHIVELOG mode.

    Run the following command to check if ARCHIVELOG mode is enabled.

    select log_mode from v$database;

    If the command returns ARCHIVELOG, the mode is already enabled and you can skip this step. Otherwise, perform the following steps to enable it.

    1. Run the following command to shut down the database.

      Important

      Perform this operation during off-peak hours to avoid affecting your business.

      shutdown immediate;
    2. Run the following command to start and mount the database.

      startup mount;
    3. Run the following command to enable archiving and open the database.

      alter database archivelog;
      alter database open;
  3. Enable supplemental logging. Run the command that corresponds to your database version.

    Database version

    Command

    Oracle 9i

    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;

    Oracle 10g

    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
    ALTER system switch logfile;
    ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

    Oracle 11g

    Oracle 12c

    Run the following command to confirm whether supplemental logging is enabled.

    SELECT supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all FROM v$database;

    If the first two columns of the result are Yes or Implicit, it means that only Identification Key Logging is enabled. You also need to enable Full Supplemental Logging.

  4. Enable identification key logging.

    You can choose to enable it for a single table or for all tables as needed.

    • To enable identification key logging for a single table:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
      ALTER TABLE schema name.table name ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
    • To enable identification key logging for all tables:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
  5. Enable full supplemental logging.

    You can choose to enable it for a single table or for all tables as needed.

    • To enable full supplemental logging for a single table:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
      ALTER TABLE <schema name>.<table name> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
    • To enable full supplemental logging for all tables:

      ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
  6. Run the following command to commit the changes.

    ALTER SYSTEM SWITCH LOGFILE;
  7. Create and grant permissions to an account for the data sync task. Run the command that corresponds to your database version.

    Database version

    Command

    Oracle 10g

    CREATE USER username IDENTIFIED BY password;
    GRANT create session, alter session, execute_catalog_role, select any dictionary, select any transaction, select any table, create any table, create any index, unlimited tablespace to user name;

    Oracle 11g

    Oracle 12c (standard mode)

    CREATE USER username IDENTIFIED BY password;
    GRANT create session, alter session, select any dictionary, select any transaction, logmining, execute_catalog_role, create any table, create any index, unlimited tablespace TO username;

    Run the `GRANT` command repeatedly to grant `SELECT` permissions as needed.

    Oracle 12c (multi-tenant mode)

    Important

    In Oracle 12c multi-tenant mode, create the user in the CDB. The username must follow the c##name convention.

    ALTER SESSION SET CONTAINER=cdb$root;
    CREATE USER username IDENTIFIED BY password CONTAINER=all;
    GRANT create session, alter session, set container, select any dictionary, select any transaction, logmining, execute_catalog_role, create any table, create any index, unlimited tablespace TO username CONTAINER=all;
    ALTER SESSION SET CONTAINER=pdb;

    Run the `GRANT` command repeatedly to grant `SELECT` permissions as needed.

    When the Oracle database is used as the source database, use this account for Java Database Connectivity (JDBC) identity verification.

    Important

    When you use this account for JDBC identity verification, you must use the full username, including c##, as the username for the JDBC connection.

Step 2: Deploy Tapdata Agent

Important

The machine where you deploy the Agent must have access to the Internet.

Data forwarding is often time-sensitive. We recommend that you deploy Tapdata Agent on the same machine as the Oracle database or on a machine in the same network. This greatly reduces the impact of network latency.

Note

This topic uses a 64-bit Linux operating system as an example to describe the Agent deployment process. To deploy the Agent on other platforms, such as Windows or Alibaba Cloud Compute Nest, see the Install Agent document.

  1. Log on to the Tapdata Cloud platform.

  2. In the navigation pane on the left, click Agent Management.

  3. On the Agent Management page, click Create Agent.

  4. In the Agent Download and Installation dialog box, select Linux (64 bit), and then follow the on-screen instructions to install the Agent.

    Important

    Before installation, make sure that Java 1.8 is installed and the environment variables are correctly configured in your deployment environment.

    1. Log on to the machine where you want to deploy the Agent.

    2. Run the following command to create the tapdata directory to install and deploy the Tapdata Agent.

      mkdir tapdata
    3. Click Copy on the interface to copy the command from Step 3, and then run the command in the tapdata directory.

      Note

      After you run this command, the Tapdata Agent is automatically downloaded, deployed, and started.

    Wait for the command to finish running. The Tapdata Agent is then installed and started.

    The following figure shows an example of a successful Tapdata Agent startup.

    fig_tapdataagent

Step 3: Connect to the source and destination databases

Use Tapdata Cloud to connect to the source and destination databases.

  1. Log on to the Tapdata Cloud platform.

  2. In the navigation pane on the left, click Connection Management.

  3. Connect to the Oracle database that serves as the source database.

    1. On the Connection Management page, click Create Connection.

    2. In the Create Connection dialog box, on the Certified Data Source tab, click Oracle.

    3. On the Create Connection page, configure the data source parameters as described in the following table.

      Parameter

      Description

      Connection Name

      The name of the data connection. Set it as needed. Connection names must be unique.

      Connection Type

      You can use Oracle as a source or a destination. Set this to Source and Destination or Source.

      Connection Method

      The method to connect to the Oracle database. Valid values:

      • SID: The database instance name. This is the name used for interaction between the database and the operating system.

      • Service Name: The service name. If the database has a domain name, the Service Name is the same as the Global DB Name. If the database does not have a domain name, the Service Name is the same as the database name.

      Database Endpoint

      The database endpoint.

      Port

      The service port of the database.

      SID

      The database instance name. This parameter is required when you set Connection Method to SID.

      Service Name

      The service name. This parameter is required when you set Connection Method to Service Name.

      Schema

      The schema name. One connection corresponds to one schema.

      Note

      To connect to multiple schemas, create multiple data connections.

      Other Connection String Parameters

      Additional connection parameters. The default value is empty.

      Account

      The database account.

      Password

      The database password.

      Multi-tenant Mode

      If your Oracle database is in multi-tenant mode, turn on the Multi-tenant Mode switch and enter the PDB information.

      Log Plug-in

      Keep the default value, logMiner.

      Time Zone for Time Data Types

      By default, this is the time zone used by the database. You can also specify a time zone based on your business needs.

      Included Tables

      The tables to include. Valid values:

      • All (default): Includes all tables in the database.

      • Custom: Specifies the tables to include. Separate multiple tables or expressions with commas (,). Expressions can use an asterisk (*) as a wildcard for any number of characters.

      Excluded Tables

      Excludes specified tables from the tables matched by the inclusion rule.

      Turn on the Excluded Tables switch and specify the tables to exclude. Separate multiple tables or expressions with commas (,). Expressions can use an asterisk (*) as a wildcard for any number of characters.

      Agent Settings

      The Agent to use. Valid values:

      • Platform-assigned (default): The platform assigns a node for the connection.

      • User-specified: Set this parameter to User-specified and select the Agent to use.

      Model Loading Frequency

      If the number of models in the data source exceeds 10,000, Tapdata Cloud periodically refreshes the models based on this parameter.

    4. Click Connection Test. After the test passes, click Save.

      Note

      If the connection test fails, follow the on-screen instructions to fix the issue.

  4. Connect to Tablestore as the destination database.

    1. On the Connection Management page, click Create Connection.

    2. In the Create Connection dialog box, on the Beta Data Source tab, click Tablestore.

    3. On the Create Connection page, configure the data source parameters as described in the following table.

      Parameter

      Description

      Connection Name

      The name of the data connection. Set this as needed.

      Connection Type

      The value is fixed to Target. This data connection can only be used as a destination in Tapdata Cloud, not as a source.

      Endpoint

      The endpoint of the Tablestore instance. For more information, see Endpoints.

      Instance Name

      The name of the Tablestore instance. For more information, see Instances.

      AccessKey ID

      The AccessKey ID and AccessKey secret of a Resource Access Management (RAM) user.

      Note

      When you use a temporary access credential from Security Token Service (STS) to access Tablestore, set these parameters to the AccessKey ID and AccessKey secret of the temporary access credential.

      AccessKey Secret

      AccessKey Token

      The temporary access token. This parameter is required when you use a temporary access credential from STS to access Tablestore.

      Agent Settings

      The agent to use. Valid values are:

      • Platform Automatic Allocation (default): The platform automatically allocates a node for the connection.

      • Manual Specification: If you set this parameter to Manual Specification, select the agent to use.

      Model Load Frequency

      If the number of models in the data source exceeds 10,000, Tapdata Cloud periodically refreshes the models based on this frequency.

    4. Click Connection Test. After the test is successful, click Save.

      Note

      If the connection test fails, follow the on-screen instructions to fix the issue.

Step 4: Configure a data replication task

You can configure a data replication task to achieve real-time synchronization between homogeneous or disparate data sources. This feature applies to scenarios such as data migration, data synchronization, data disaster recovery, and read performance scaling.

  1. Log on to the Tapdata Cloud platform.

  2. In the navigation pane on the left, click Data Replication.

  3. On the Data Replication page, click Create.

  4. On the left side of the page, drag the Oracle and Tablestore databases to the canvas on the right, and then connect them.

  5. Click the Oracle database and configure the parameters in the right-side panel as described in the following table.

    Parameter

    Description

    Node Name

    The default value is the connection name. Set it as needed.

    DDL Event Collection

    DDL statement synchronization to Tablestore is not supported. You do not need to configure this parameter.

    Select Tables

    Select the tables to replicate. Valid values:

    • Select by table name: Selects tables to replicate by their names.

      In the Tables to Replicate area, select the tables and click the fig_row icon to complete the setting.

      You can also click Batch Select Tables and enter multiple table names to select them in a batch. Separate multiple table names with commas (,).

    • Match by regular expression: Uses a regular expression to match tables to replicate.

      If you use this method, new tables in the source database that match the expression are automatically synced to the destination database.

    Batch Read Size

    The number of records to read in each batch during full synchronization. The default value is 100. In most cases, you can use the default value.

    Continuous Log Mining

    Specifies whether to enable continuous log mining. This is enabled by default. Configure this based on your Oracle version.

    • If your Oracle version is earlier than 19c, enable this option for lower latency.

    • If your Oracle version is 19c or later, you must disable this option.

      Important

      If your Oracle version is 19c or later, make sure to disable this option.

    Log Backlog Size

    The maximum log backlog size for continuous log mining. This parameter can be set only when the Continuous Log Mining option is enabled. Set this based on the source update frequency.

    • If the source update frequency is low, set a lower value for lower latency.

    • If the source update frequency is high, set a higher value for higher throughput.

    You can calculate this value using the formula: Source data change QPS / 10. The recommended value is 1 to 1,000.

    Mining Threads

    The number of threads for log mining. This increases the log mining speed at the cost of higher database CPU usage. This parameter can be set only when the Continuous Log Mining option is disabled.

    If the source update frequency exceeds 10,000, set this value to a maximum of 8.

    Large Transaction Event Threshold

    The threshold at which a transaction event is defined as a large transaction. The default value is 10,000. When the number of transaction events exceeds this value, the large transaction logic is triggered.

    Important
    • Setting this value too high may affect memory.

    • Large transactions are cached to the local disk. If the replication task fails, make sure to clear the disk.

    Associated Key Update

    Specifies whether to update associated keys. This is enabled by default.

    Disabling the Associated Key Update option can improve performance, but updates to associated keys are ignored.

    Enable LOB Type Sync (BLOB, CLOB, NCLOB)

    Specifies whether to sync LOB types. This is enabled by default.

    Disabling the Enable LOB Type Sync (BLOB, CLOB, NCLOB) option can improve performance, but LOB type parsing becomes unreliable.

    Uncommitted Transaction Lifetime (minutes)

    The maximum retention period for uncommitted transactions. The default value is 60. The unit is minutes.

    Long-running uncommitted transactions cause the task to start mining from that transaction every time it starts or stops, which affects performance. Therefore, transactions that are not committed within this period are cleared.

  6. Click the Tablestore database to preview the data structure and configure advanced settings.

    Parameter

    Description

    Node Name

    The default value is the connection name. Set it as needed.

    Inferred Result

    View the table schema, such as field names and field types, and set the field types as needed.

    Note
    • To adjust a field type, click the fig_down icon in the Destination Field Type column and complete the settings in the dialog box that appears.

    • A single Tablestore table can have a maximum of 32 columns. If the Oracle source table has more than 32 columns, drag a field editing node from the left of the page. Connect this node between the Oracle source and the Tablestore destination. Then, in the field editing node, mask the business-irrelevant columns to meet the requirement.

    Advanced Settings

    Duplicate Data Processing Policy

    The policy for handling existing data in the destination table. The default value is Keep Original Schema and Data in Destination. If the destination table is empty and its schema is different from the source table, you can select Purge Original Table Schema and Data in Destination.

    Data Write Mode

    The method for writing data to Tablestore. Valid values:

    • Process by event type: Sets the data write policy for insert, update, and delete events separately.

    • Append-only: Processes only insert events and discards update and delete events.

    Data Write Policy

    The policy for writing data when processing by event type. This parameter can be set only when you set Data Write Mode to Process by event type.

    The policy for insert events is Update on Existing. The policy for update and delete events is Discard on Non-existing.

    Full Multi-threaded Write

    The number of concurrent threads for full data writes. The default value is 8. You can adjust this value based on the write performance of the destination.

    Incremental Write Threads

    The number of concurrent threads for incremental data writes. This is disabled by default. After you enable the Incremental Write Threads option, you can adjust the value based on the write performance of the destination.

  7. Optional: Click the fig_settingicon icon at the top of the page and configure the task properties as described in the following table.

    Parameter

    Description

    Task Name

    The name of the sync task. Set it as needed.

    Sync Type

    The type of data synchronization. Valid values are:

    • Full + Incremental: After the full data synchronization is complete, incremental data is synchronized.

    • Full: Copies existing data from the source to the destination.

    • Incremental: Copies new data or data changes from the source to the destination in real time.

    Task Description

    A description of the task.

    Advanced Settings

    Scheduled Start Time

    The scheduled start time for the task.

    To specify a start time, turn on the Scheduled Start Time switch and select a start date and time.

    Data Validation

    After you turn on the Data Validation switch, the task automatically performs full and incremental validation on the consistency of the synchronization results. The number of inconsistent data rows is displayed on the run monitoring page.

    Important

    If the task meets any of the following conditions, validation is not performed even if the Data Validation switch is turned on.

    • An intermediate processing node is added.

    • The source connection does not support validation.

    • The destination connection does not support validation.

    Incremental Concurrent Write

    After you turn on the Incremental Concurrent Write switch, source data is written concurrently to the destination during incremental synchronization.

    Incremental Lag Judgment Time Setting

    Used to quickly determine if a replication task is lagging.

    Turn on the Incremental Lag Judgment Time Setting switch and set the judgment time. During incremental synchronization, if the lag time is greater than the set judgment time, the replication task is considered lagging.

    The default value is 1, in seconds.

    Processor Threads

    The number of processor threads.

    Incremental Data Processing Mode

    The processing mode for incremental data during incremental synchronization. The value can be Batch or One by one.

    agent settings

    The Agent information to use. Valid values are:

    • Platform automatic allocation (default): The platform allocates a node for connection access.

    • User manual specification: After setting this parameter to User manual specification, select the Agent to use.

  8. After you confirm the settings, click Start.

    After the operation is complete, you can view the execution status of the task, such as queries per second (QPS), latency, and task time statistics.

上一篇: Synchronize data from PolarDB-X 2.0 to Tablestore 下一篇: Synchronize HBase data
阿里云首页 表格存储 相关技术圈