Sync Oracle data
You can use the Tapdata Cloud visualization interface to sync data from an Oracle database to Tablestore in real time.
Prerequisites
Before you sync data from an Oracle database to Tablestore using Tapdata Cloud, you must complete the following preparations.
You have registered a Tapdata Cloud account.
You have created a Tablestore instance and a data table. For more information, see Create an instance and Create a data table.
You have created a Resource Access Management (RAM) user and granted the RAM user the AliyunOTSFullAccess permission to manage Tablestore. For more information, see Use a RAM user's AccessKey pair to access Tablestore.
Background information
Tapdata Cloud is a real-time data service from Tapdata that integrates data replication and data development. It provides real-time data synchronization with millisecond-level latency and data fusion services for cross-cloud, cross-region, and disparate data source scenarios. For more information, see What is Tapdata.
Notes
Data synchronization is supported only for Oracle 9i, 10g, 11g, 12c, and 19c.
If the `connect_time` parameter is set for the Oracle database, real-time synchronization may fail.
Run the following command to check the `connect_time` parameter settings. Replace username with the database account name.
、
select resource_name, limit from dba_profiles where profile=( select profile from dba_users where username = 'username');You must reserve enough storage space for archived logs. This prevents the storage from becoming full and affecting Oracle database operations.
Procedure
To sync data from an Oracle database to Tablestore using Tapdata Cloud, perform the following steps:
Configure the Oracle database and an authorized account. For more information, see Step 1: Configure the Oracle database and an authorized account.
Deploy Tapdata Agent on the machine where the Oracle database resides or on a machine in the same network. For more information, see Step 2: Deploy Tapdata Agent.
Tapdata Agent (Agent) is a key program for data synchronization, data transformation for disparate data sources, and data development scenarios. It uses stream processing to retrieve data from the source, process and transform the data, and send it to the destination.
ImportantThe Agent uses stream processing to retrieve data from the source, process and transform the data, and send it to the destination. Your data does not pass through Tapdata Cloud and is not uploaded or stored.
Use Tapdata Cloud to connect to the source and destination databases. For more information, see Step 3: Connect to the source and destination databases.
Create a data replication task to replicate data from the source database to the destination database. For more information, see Step 4: Configure a data replication task.
Step 1: Configure the Oracle database and an authorized account
Before you start data synchronization, you must enable ARCHIVELOG mode, supplemental logging, identification key logging, and full supplemental logging for the Oracle database. Then, you must create an account for the data sync task.
Log on to the Oracle database as a user with DBA permissions.
Enable ARCHIVELOG mode.
Run the following command to check if ARCHIVELOG mode is enabled.
select log_mode from v$database;If the command returns ARCHIVELOG, the mode is already enabled and you can skip this step. Otherwise, perform the following steps to enable it.
Run the following command to shut down the database.
ImportantPerform this operation during off-peak hours to avoid affecting your business.
shutdown immediate;Run the following command to start and mount the database.
startup mount;Run the following command to enable archiving and open the database.
alter database archivelog; alter database open;
Enable supplemental logging. Run the command that corresponds to your database version.
Database version
Command
Oracle 9i
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;Oracle 10g
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; ALTER system switch logfile; ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;Oracle 11g
Oracle 12c
Run the following command to confirm whether supplemental logging is enabled.
SELECT supplemental_log_data_min, supplemental_log_data_pk, supplemental_log_data_all FROM v$database;If the first two columns of the result are Yes or Implicit, it means that only Identification Key Logging is enabled. You also need to enable Full Supplemental Logging.
Enable identification key logging.
You can choose to enable it for a single table or for all tables as needed.
To enable identification key logging for a single table:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; ALTER TABLE schema name.table name ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;To enable identification key logging for all tables:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
Enable full supplemental logging.
You can choose to enable it for a single table or for all tables as needed.
To enable full supplemental logging for a single table:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; ALTER TABLE <schema name>.<table name> ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;To enable full supplemental logging for all tables:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
Run the following command to commit the changes.
ALTER SYSTEM SWITCH LOGFILE;Create and grant permissions to an account for the data sync task. Run the command that corresponds to your database version.
Database version
Command
Oracle 10g
CREATE USER username IDENTIFIED BY password; GRANT create session, alter session, execute_catalog_role, select any dictionary, select any transaction, select any table, create any table, create any index, unlimited tablespace to user name;Oracle 11g
Oracle 12c (standard mode)
CREATE USER username IDENTIFIED BY password; GRANT create session, alter session, select any dictionary, select any transaction, logmining, execute_catalog_role, create any table, create any index, unlimited tablespace TO username;Run the `GRANT` command repeatedly to grant `SELECT` permissions as needed.
Oracle 12c (multi-tenant mode)
ImportantIn Oracle 12c multi-tenant mode, create the user in the CDB. The username must follow the
c##nameconvention.ALTER SESSION SET CONTAINER=cdb$root; CREATE USER username IDENTIFIED BY password CONTAINER=all; GRANT create session, alter session, set container, select any dictionary, select any transaction, logmining, execute_catalog_role, create any table, create any index, unlimited tablespace TO username CONTAINER=all; ALTER SESSION SET CONTAINER=pdb;Run the `GRANT` command repeatedly to grant `SELECT` permissions as needed.
When the Oracle database is used as the source database, use this account for Java Database Connectivity (JDBC) identity verification.
ImportantWhen you use this account for JDBC identity verification, you must use the full username, including
c##, as the username for the JDBC connection.
Step 2: Deploy Tapdata Agent
The machine where you deploy the Agent must have access to the Internet.
Data forwarding is often time-sensitive. We recommend that you deploy Tapdata Agent on the same machine as the Oracle database or on a machine in the same network. This greatly reduces the impact of network latency.
This topic uses a 64-bit Linux operating system as an example to describe the Agent deployment process. To deploy the Agent on other platforms, such as Windows or Alibaba Cloud Compute Nest, see the Install Agent document.
Log on to the Tapdata Cloud platform.
In the navigation pane on the left, click Agent Management.
On the Agent Management page, click Create Agent.
In the Agent Download and Installation dialog box, select Linux (64 bit), and then follow the on-screen instructions to install the Agent.
ImportantBefore installation, make sure that Java 1.8 is installed and the environment variables are correctly configured in your deployment environment.
Log on to the machine where you want to deploy the Agent.
Run the following command to create the tapdata directory to install and deploy the Tapdata Agent.
mkdir tapdataClick Copy on the interface to copy the command from Step 3, and then run the command in the tapdata directory.
NoteAfter you run this command, the Tapdata Agent is automatically downloaded, deployed, and started.
Wait for the command to finish running. The Tapdata Agent is then installed and started.
The following figure shows an example of a successful Tapdata Agent startup.

Step 3: Connect to the source and destination databases
Use Tapdata Cloud to connect to the source and destination databases.
Log on to the Tapdata Cloud platform.
In the navigation pane on the left, click Connection Management.
Connect to the Oracle database that serves as the source database.
On the Connection Management page, click Create Connection.
In the Create Connection dialog box, on the Certified Data Source tab, click Oracle.
On the Create Connection page, configure the data source parameters as described in the following table.
Parameter
Description
Connection Name
The name of the data connection. Set it as needed. Connection names must be unique.
Connection Type
You can use Oracle as a source or a destination. Set this to Source and Destination or Source.
Connection Method
The method to connect to the Oracle database. Valid values:
SID: The database instance name. This is the name used for interaction between the database and the operating system.
Service Name: The service name. If the database has a domain name, the Service Name is the same as the Global DB Name. If the database does not have a domain name, the Service Name is the same as the database name.
Database Endpoint
The database endpoint.
Port
The service port of the database.
SID
The database instance name. This parameter is required when you set Connection Method to SID.
Service Name
The service name. This parameter is required when you set Connection Method to Service Name.
Schema
The schema name. One connection corresponds to one schema.
NoteTo connect to multiple schemas, create multiple data connections.
Other Connection String Parameters
Additional connection parameters. The default value is empty.
Account
The database account.
Password
The database password.
Multi-tenant Mode
If your Oracle database is in multi-tenant mode, turn on the Multi-tenant Mode switch and enter the PDB information.
Log Plug-in
Keep the default value, logMiner.
Time Zone for Time Data Types
By default, this is the time zone used by the database. You can also specify a time zone based on your business needs.
Included Tables
The tables to include. Valid values:
All (default): Includes all tables in the database.
Custom: Specifies the tables to include. Separate multiple tables or expressions with commas (,). Expressions can use an asterisk (*) as a wildcard for any number of characters.
Excluded Tables
Excludes specified tables from the tables matched by the inclusion rule.
Turn on the Excluded Tables switch and specify the tables to exclude. Separate multiple tables or expressions with commas (,). Expressions can use an asterisk (*) as a wildcard for any number of characters.
Agent Settings
The Agent to use. Valid values:
Platform-assigned (default): The platform assigns a node for the connection.
User-specified: Set this parameter to User-specified and select the Agent to use.
Model Loading Frequency
If the number of models in the data source exceeds 10,000, Tapdata Cloud periodically refreshes the models based on this parameter.
Click Connection Test. After the test passes, click Save.
NoteIf the connection test fails, follow the on-screen instructions to fix the issue.
Connect to Tablestore as the destination database.
On the Connection Management page, click Create Connection.
In the Create Connection dialog box, on the Beta Data Source tab, click Tablestore.
On the Create Connection page, configure the data source parameters as described in the following table.
Parameter
Description
Connection Name
The name of the data connection. Set this as needed.
Connection Type
The value is fixed to Target. This data connection can only be used as a destination in Tapdata Cloud, not as a source.
Endpoint
The endpoint of the Tablestore instance. For more information, see Endpoints.
Instance Name
The name of the Tablestore instance. For more information, see Instances.
AccessKey ID
The AccessKey ID and AccessKey secret of a Resource Access Management (RAM) user.
NoteWhen you use a temporary access credential from Security Token Service (STS) to access Tablestore, set these parameters to the AccessKey ID and AccessKey secret of the temporary access credential.
AccessKey Secret
AccessKey Token
The temporary access token. This parameter is required when you use a temporary access credential from STS to access Tablestore.
Agent Settings
The agent to use. Valid values are:
Platform Automatic Allocation (default): The platform automatically allocates a node for the connection.
Manual Specification: If you set this parameter to Manual Specification, select the agent to use.
Model Load Frequency
If the number of models in the data source exceeds 10,000, Tapdata Cloud periodically refreshes the models based on this frequency.
Click Connection Test. After the test is successful, click Save.
NoteIf the connection test fails, follow the on-screen instructions to fix the issue.
Step 4: Configure a data replication task
You can configure a data replication task to achieve real-time synchronization between homogeneous or disparate data sources. This feature applies to scenarios such as data migration, data synchronization, data disaster recovery, and read performance scaling.
Log on to the Tapdata Cloud platform.
In the navigation pane on the left, click Data Replication.
On the Data Replication page, click Create.
On the left side of the page, drag the Oracle and Tablestore databases to the canvas on the right, and then connect them.
Click the Oracle database and configure the parameters in the right-side panel as described in the following table.
Parameter
Description
Node Name
The default value is the connection name. Set it as needed.
DDL Event Collection
DDL statement synchronization to Tablestore is not supported. You do not need to configure this parameter.
Select Tables
Select the tables to replicate. Valid values:
Select by table name: Selects tables to replicate by their names.
In the Tables to Replicate area, select the tables and click the
icon to complete the setting.You can also click Batch Select Tables and enter multiple table names to select them in a batch. Separate multiple table names with commas (,).
Match by regular expression: Uses a regular expression to match tables to replicate.
If you use this method, new tables in the source database that match the expression are automatically synced to the destination database.
Batch Read Size
The number of records to read in each batch during full synchronization. The default value is 100. In most cases, you can use the default value.
Continuous Log Mining
Specifies whether to enable continuous log mining. This is enabled by default. Configure this based on your Oracle version.
If your Oracle version is earlier than 19c, enable this option for lower latency.
If your Oracle version is 19c or later, you must disable this option.
ImportantIf your Oracle version is 19c or later, make sure to disable this option.
Log Backlog Size
The maximum log backlog size for continuous log mining. This parameter can be set only when the Continuous Log Mining option is enabled. Set this based on the source update frequency.
If the source update frequency is low, set a lower value for lower latency.
If the source update frequency is high, set a higher value for higher throughput.
You can calculate this value using the formula:
Source data change QPS / 10. The recommended value is 1 to 1,000.Mining Threads
The number of threads for log mining. This increases the log mining speed at the cost of higher database CPU usage. This parameter can be set only when the Continuous Log Mining option is disabled.
If the source update frequency exceeds 10,000, set this value to a maximum of 8.
Large Transaction Event Threshold
The threshold at which a transaction event is defined as a large transaction. The default value is 10,000. When the number of transaction events exceeds this value, the large transaction logic is triggered.
ImportantSetting this value too high may affect memory.
Large transactions are cached to the local disk. If the replication task fails, make sure to clear the disk.
Associated Key Update
Specifies whether to update associated keys. This is enabled by default.
Disabling the Associated Key Update option can improve performance, but updates to associated keys are ignored.
Enable LOB Type Sync (BLOB, CLOB, NCLOB)
Specifies whether to sync LOB types. This is enabled by default.
Disabling the Enable LOB Type Sync (BLOB, CLOB, NCLOB) option can improve performance, but LOB type parsing becomes unreliable.
Uncommitted Transaction Lifetime (minutes)
The maximum retention period for uncommitted transactions. The default value is 60. The unit is minutes.
Long-running uncommitted transactions cause the task to start mining from that transaction every time it starts or stops, which affects performance. Therefore, transactions that are not committed within this period are cleared.
Click the Tablestore database to preview the data structure and configure advanced settings.
Parameter
Description
Node Name
The default value is the connection name. Set it as needed.
Inferred Result
View the table schema, such as field names and field types, and set the field types as needed.
NoteTo adjust a field type, click the
icon in the Destination Field Type column and complete the settings in the dialog box that appears.A single Tablestore table can have a maximum of 32 columns. If the Oracle source table has more than 32 columns, drag a field editing node from the left of the page. Connect this node between the Oracle source and the Tablestore destination. Then, in the field editing node, mask the business-irrelevant columns to meet the requirement.
Advanced Settings
Duplicate Data Processing Policy
The policy for handling existing data in the destination table. The default value is Keep Original Schema and Data in Destination. If the destination table is empty and its schema is different from the source table, you can select Purge Original Table Schema and Data in Destination.
Data Write Mode
The method for writing data to Tablestore. Valid values:
Process by event type: Sets the data write policy for insert, update, and delete events separately.
Append-only: Processes only insert events and discards update and delete events.
Data Write Policy
The policy for writing data when processing by event type. This parameter can be set only when you set Data Write Mode to Process by event type.
The policy for insert events is Update on Existing. The policy for update and delete events is Discard on Non-existing.
Full Multi-threaded Write
The number of concurrent threads for full data writes. The default value is 8. You can adjust this value based on the write performance of the destination.
Incremental Write Threads
The number of concurrent threads for incremental data writes. This is disabled by default. After you enable the Incremental Write Threads option, you can adjust the value based on the write performance of the destination.
Optional: Click the
icon at the top of the page and configure the task properties as described in the following table.Parameter
Description
Task Name
The name of the sync task. Set it as needed.
Sync Type
The type of data synchronization. Valid values are:
Full + Incremental: After the full data synchronization is complete, incremental data is synchronized.
Full: Copies existing data from the source to the destination.
Incremental: Copies new data or data changes from the source to the destination in real time.
Task Description
A description of the task.
Advanced Settings
Scheduled Start Time
The scheduled start time for the task.
To specify a start time, turn on the Scheduled Start Time switch and select a start date and time.
Data Validation
After you turn on the Data Validation switch, the task automatically performs full and incremental validation on the consistency of the synchronization results. The number of inconsistent data rows is displayed on the run monitoring page.
ImportantIf the task meets any of the following conditions, validation is not performed even if the Data Validation switch is turned on.
An intermediate processing node is added.
The source connection does not support validation.
The destination connection does not support validation.
Incremental Concurrent Write
After you turn on the Incremental Concurrent Write switch, source data is written concurrently to the destination during incremental synchronization.
Incremental Lag Judgment Time Setting
Used to quickly determine if a replication task is lagging.
Turn on the Incremental Lag Judgment Time Setting switch and set the judgment time. During incremental synchronization, if the lag time is greater than the set judgment time, the replication task is considered lagging.
The default value is 1, in seconds.
Processor Threads
The number of processor threads.
Incremental Data Processing Mode
The processing mode for incremental data during incremental synchronization. The value can be Batch or One by one.
agent settings
The Agent information to use. Valid values are:
Platform automatic allocation (default): The platform allocates a node for connection access.
User manual specification: After setting this parameter to User manual specification, select the Agent to use.
After you confirm the settings, click Start.
After the operation is complete, you can view the execution status of the task, such as queries per second (QPS), latency, and task time statistics.