PolarDB is a ready-to-use, stable, reliable, and scalable online database service from Alibaba Cloud (Learn about PolarDB).
Before you begin
-
OpenSearch supports PolarDB for MySQL 5.6, 5.7, and 8.0.
-
The PolarDB cluster must be in the same Alibaba Cloud account that you use for the OpenSearch console.
-
The PolarDB cluster must be in the same region as the OpenSearch application.
-
New PolarDB clusters have binary logging (binlog) disabled by default, which prevents data source registration. You must enable binlog to proceed. To do so, set the
loose_polar_log_binparameter toON_WITH_GTID. Thebinlog_row_imageparameter is set toFULLby default and does not need to be changed. -
OpenSearch supports cloned instances.
-
The PolarDB cluster must be a read/write cluster.
Features
-
Pull full data from specified database tables, either manually or on a schedule.
-
Merge data horizontally from multiple source tables. The source tables must have identical schemas and data source plugin configurations, and their primary key values must be unique. Duplicate primary key values overwrite existing records. This feature supports two main scenarios:
-
An application table is configured with a single data source that contains multiple source tables.
-
An application table is configured with multiple data sources, each containing one or more source tables.
-
-
OpenSearch supports data source field transformation plugins.
-
Supported data synchronization methods:
-
Automatic synchronization
-
No automatic synchronization
-
-
You can use filter conditions for full data synchronization.
-
Use the wildcard character (
*) to match database table names.
-
When you select **automatic synchronization**, OpenSearch uses an internal service to subscribe to your database's binlog to synchronize incremental data. User operations, such as deleting database tables, changing access permissions, clearing binlog files, or modifying the database password, can disrupt this subscription process. This disruption can prevent OpenSearch from synchronizing incremental data, and OpenSearch is not liable for any resulting data inconsistencies. Before you perform such operations, ensure that you fully understand the potential consequences and take necessary precautions.
-
If you select automatic synchronization, OpenSearch ensures the stability of the synchronization service but does not guarantee a specific latency. If your business is sensitive to synchronization latency, we recommend that you use DTS data subscription instance (DTS real-time synchronization).
Limitations
-
PolarDB clusters support only the
fullbinlog mode. To enable this mode, setloose_polar_log_bintoON_WITH_GTID. Thebinlog_row_imageparameter is set toFULLby default and does not need to be changed. -
Only PolarDB for MySQL 5.6, 5.7, and 8.0 are supported.
-
The PolarDB cluster must belong to the Alibaba Cloud account that you use to log on to the OpenSearch console.
-
The PolarDB cluster must be in the same region as the OpenSearch application.
-
You cannot push incremental data by using an SDK or API to Standard Edition applications that use a PolarDB data source.
-
For Standard Edition applications, filter conditions are not supported for PolarDB data sources.
-
The
REPLACE INTOsyntax is not supported. -
The
TRUNCATEandDROPcommands are not supported. Use theDELETEcommand to delete data. -
The PolarDB access password cannot contain the
%character. Otherwise, the reindex task fails. -
OpenSearch does not support merging columns from source tables that have different schemas.
-
Set both
loose_max_statement_timeandconnect_timeoutto0. After a full data synchronization triggered by a reindex or offline change is complete, you can revert the parameters to their previous values.
Usage notes
-
If a data source, such as an RDS or PolarDB instance, is attached to a Distributed Relational Database Service (DRDS) instance that you want to connect to OpenSearch, you must specify the actual shard database name in the data source configuration. A DRDS database is split into one shadow database and eight shards, and data is written randomly across the shards.
-
PolarDB clusters support switching between internal and public endpoints. OpenSearch does not charge traffic fees for data retrieval from PolarDB.
-
OpenSearch pulls full data only from the primary database. We recommend that you schedule reindexing and full data imports during off-peak hours.
-
The system automatically converts values of the
datetimeandtimestamptypes in PolarDB tables to milliseconds. Set the corresponding field type in the application table toTIMESTAMP. -
The synchronization process excludes documents that do not meet the full data filter conditions. If a document with the same primary key value exists in the application table, it is also deleted.
-
If no incremental data is generated from the data source for 15 or more consecutive days, data synchronization may fail. To resolve this issue, perform a manual reindex or offline change.
-
If SSL encryption is enabled on the PolarDB cluster, ensure that the SSL certificate is valid. An expired certificate causes connection errors. Update the certificate before it expires.
-
You cannot configure a PolarDB data source in the Qingdao region.
-
To synchronize data from a PolarDB data source, you must add the OpenSearch server IP address blocks to the IP whitelist of your RDS or PolarDB instance. The following table lists the IP whitelist for each region.
Region
IP address
Hangzhou
100.104.190.128/26,100.104.241.128/26
Beijing
100.104.16.192/26,100.104.179.0/26
Shanghai
100.104.37.0/26,100.104.46.0/26
Shenzhen
100.104.87.192/26,100.104.132.192/26
Zhangjiakou
100.104.155.192/26,100.104.238.64/26
Germany
100.104.127.0/26,100.104.35.192/26
United States
100.104.193.128/26,100.104.119.128/26
Singapore
100.104.58.192/26,100.104.74.192/26
Account authorization
-
When you connect to a PolarDB data source, you must authorize access to the cluster and provide an account and password. Choose your account credentials carefully during the initial setup.
-
Ensure account permissions: The account must have permission to view all tables in the database, which is a requirement of the upstream DTS service. This ensures that the account can execute show create table
*.*. Insufficient permissions can cause real-time synchronization to fail. -
Minimize changes to account permissions: Changing account credentials can disrupt ongoing real-time tasks and affect the creation of new application versions. If you change the account password, you must delete the data source configuration and then create a new one.
FAQ
-
If reindexing is stuck after I configure a PolarDB data source, what should I do? Create a test table in the same database as your source table. Write or update one to two records per minute to ensure continuous binlog generation during the reindex process.
-
After you settle overdue payments for an Advanced Edition application, you can trigger a manual reindex to resume data synchronization.
-
The PolarDB cluster access password cannot contain the
%character. Otherwise, the reindex task fails with anIllegal hex characters in escape (%) patternerror. -
The system requires that the primary key values in an application table are unique. If primary key values are duplicated in a table sharding scenario, data will be overwritten. You can use the StringCatenateExtractor data source plug-in to merge multiple field values. The source fields are
pk,$table, wherepkis the primary key field of the PolarDB cluster table and$tableis a default system variable that represents the corresponding database table name. The concatenation character is-, which can be customized.
For example, if the PolarDB cluster table is my_table_0 and the primary key field value is 123456, the new primary key value after concatenation is 123456-my_table_0.
-
To filter data based on a
dateordatetimefield in the database table, you must use the complete date and time format in the filter conditions. For a field namedcreatetime, a valid condition iscreatetime>'2018-03-01 00:00:00'. Using an abbreviated format likecreatetime>'2018-3-1 00:00:00'will cause an error.
Configure a PolarDB data source
-
You can configure a PolarDB data source when you create an application.
-
For an existing application, you can modify the data source by performing an offline change on the application details page.
Procedure
1. When creating or modifying an application, go to the Data Source step. Add or edit a data source, select PolarDB, and then click New Database.
2. After entering the PolarDB data source information, click Connect.
|
Parameter |
Description |
|
Cluster ID |
The ID of the PolarDB cluster. You can obtain the ID from the PolarDB console. This parameter is case-sensitive. Example: |
|
Database name |
The name of the database to connect to within the instance. This parameter is not case-sensitive. |
|
Username |
The database account used to retrieve the table schema and full data. This parameter is case-sensitive. |
|
Password |
The password for the specified account. |
OpenSearch attempts to connect and provides feedback based on the outcome:
|
Message |
Actions |
|
This PolarDB cluster does not exist in the current region for the current user. |
Verify that the cluster ID is correct and ensure that the PolarDB cluster is in the same region as the OpenSearch application. If the issue persists, submit a ticket. |
|
Failed to connect to the database service. |
Verify that the PolarDB connection details, including the cluster ID, database name, username, and password, are correct. |
|
This table does not exist in the current PolarDB cluster. |
Verify that the table name is correct and confirm that the table exists in the PolarDB database. |
|
Issue with PolarDB cluster configuration items. |
Go to the Parameter Settings page in the PolarDB console, modify the relevant configuration items, and then retry. |
3. After connecting to the PolarDB data source, select the data tables.
In the Data tables list on the left, select the checkbox for a desired table and click the >> button to add it to the Selected list on the right.
-
Select or enter the name of the table to access in the database. The name is case-sensitive.
-
Wildcard matching for sharded tables is supported, such as
table_*to matchtable_a,table_b, and so on.
4. If the connection is successful, configure the fields. OpenSearch automatically retrieves the table fields. For information about data source plugins, see Use data processing plugins.
To add a data source plugin, click the + button in the Content conversion column for the corresponding field mapping. After finishing the field mapping, click OK.
5. Configure filter conditions for the PolarDB data source (not supported for Standard Edition), set the data synchronization method, and then click Finish to complete the application structure configuration.
The data synchronization method provides three options: Automatic synchronization, DTS data subscription instance, and No automatic synchronization.
-
An OpenSearch application can be configured with multiple data sources, but their table structures and configurations must be identical.
-
If the console indicates that automatic synchronization is not supported, use DTS real-time synchronization.
-
The filter conditions for a PolarDB data source ensure that only records that meet the specified criteria are synchronized. For detailed configuration information, see Filter conditions for a data source.