PolarDB data source configuration

更新时间:
复制 MD 格式

PolarDB is a ready-to-use, stable, reliable, and scalable online database service from Alibaba Cloud (Learn about PolarDB).

Before you begin

  • OpenSearch supports PolarDB for MySQL 5.6, 5.7, and 8.0.

  • The PolarDB cluster must be in the same Alibaba Cloud account that you use for the OpenSearch console.

  • The PolarDB cluster must be in the same region as the OpenSearch application.

  • New PolarDB clusters have binary logging (binlog) disabled by default, which prevents data source registration. You must enable binlog to proceed. To do so, set the loose_polar_log_bin parameter to ON_WITH_GTID. The binlog_row_image parameter is set to FULL by default and does not need to be changed.

  • OpenSearch supports cloned instances.

  • The PolarDB cluster must be a read/write cluster.

Features

  • Pull full data from specified database tables, either manually or on a schedule.

  • Merge data horizontally from multiple source tables. The source tables must have identical schemas and data source plugin configurations, and their primary key values must be unique. Duplicate primary key values overwrite existing records. This feature supports two main scenarios:

    • An application table is configured with a single data source that contains multiple source tables.

    • An application table is configured with multiple data sources, each containing one or more source tables.

  • OpenSearch supports data source field transformation plugins.

  • Supported data synchronization methods:

  • You can use filter conditions for full data synchronization.

  • Use the wildcard character (*) to match database table names.

Important
  • When you select **automatic synchronization**, OpenSearch uses an internal service to subscribe to your database's binlog to synchronize incremental data. User operations, such as deleting database tables, changing access permissions, clearing binlog files, or modifying the database password, can disrupt this subscription process. This disruption can prevent OpenSearch from synchronizing incremental data, and OpenSearch is not liable for any resulting data inconsistencies. Before you perform such operations, ensure that you fully understand the potential consequences and take necessary precautions.

  • If you select automatic synchronization, OpenSearch ensures the stability of the synchronization service but does not guarantee a specific latency. If your business is sensitive to synchronization latency, we recommend that you use DTS data subscription instance (DTS real-time synchronization).

Limitations

  • PolarDB clusters support only the full binlog mode. To enable this mode, set loose_polar_log_bin to ON_WITH_GTID. The binlog_row_image parameter is set to FULL by default and does not need to be changed.

  • Only PolarDB for MySQL 5.6, 5.7, and 8.0 are supported.

  • The PolarDB cluster must belong to the Alibaba Cloud account that you use to log on to the OpenSearch console.

  • The PolarDB cluster must be in the same region as the OpenSearch application.

  • You cannot push incremental data by using an SDK or API to Standard Edition applications that use a PolarDB data source.

  • For Standard Edition applications, filter conditions are not supported for PolarDB data sources.

  • The REPLACE INTO syntax is not supported.

  • The TRUNCATE and DROP commands are not supported. Use the DELETE command to delete data.

  • The PolarDB access password cannot contain the % character. Otherwise, the reindex task fails.

  • OpenSearch does not support merging columns from source tables that have different schemas.

  • Set both loose_max_statement_time and connect_timeout to 0. After a full data synchronization triggered by a reindex or offline change is complete, you can revert the parameters to their previous values.

Usage notes

  • If a data source, such as an RDS or PolarDB instance, is attached to a Distributed Relational Database Service (DRDS) instance that you want to connect to OpenSearch, you must specify the actual shard database name in the data source configuration. A DRDS database is split into one shadow database and eight shards, and data is written randomly across the shards.

  • PolarDB clusters support switching between internal and public endpoints. OpenSearch does not charge traffic fees for data retrieval from PolarDB.

  • OpenSearch pulls full data only from the primary database. We recommend that you schedule reindexing and full data imports during off-peak hours.

  • The system automatically converts values of the datetime and timestamp types in PolarDB tables to milliseconds. Set the corresponding field type in the application table to TIMESTAMP.

  • The synchronization process excludes documents that do not meet the full data filter conditions. If a document with the same primary key value exists in the application table, it is also deleted.

  • If no incremental data is generated from the data source for 15 or more consecutive days, data synchronization may fail. To resolve this issue, perform a manual reindex or offline change.

  • If SSL encryption is enabled on the PolarDB cluster, ensure that the SSL certificate is valid. An expired certificate causes connection errors. Update the certificate before it expires.

  • You cannot configure a PolarDB data source in the Qingdao region.

  • To synchronize data from a PolarDB data source, you must add the OpenSearch server IP address blocks to the IP whitelist of your RDS or PolarDB instance. The following table lists the IP whitelist for each region.

    Region

    IP address

    Hangzhou

    100.104.190.128/26,100.104.241.128/26

    Beijing

    100.104.16.192/26,100.104.179.0/26

    Shanghai

    100.104.37.0/26,100.104.46.0/26

    Shenzhen

    100.104.87.192/26,100.104.132.192/26

    Zhangjiakou

    100.104.155.192/26,100.104.238.64/26

    Germany

    100.104.127.0/26,100.104.35.192/26

    United States

    100.104.193.128/26,100.104.119.128/26

    Singapore

    100.104.58.192/26,100.104.74.192/26

Account authorization

  • When you connect to a PolarDB data source, you must authorize access to the cluster and provide an account and password. Choose your account credentials carefully during the initial setup.

  • Ensure account permissions: The account must have permission to view all tables in the database, which is a requirement of the upstream DTS service. This ensures that the account can execute show create table *.*. Insufficient permissions can cause real-time synchronization to fail.

  • Minimize changes to account permissions: Changing account credentials can disrupt ongoing real-time tasks and affect the creation of new application versions. If you change the account password, you must delete the data source configuration and then create a new one.

FAQ

  • If reindexing is stuck after I configure a PolarDB data source, what should I do? Create a test table in the same database as your source table. Write or update one to two records per minute to ensure continuous binlog generation during the reindex process.

  • After you settle overdue payments for an Advanced Edition application, you can trigger a manual reindex to resume data synchronization.

  • The PolarDB cluster access password cannot contain the % character. Otherwise, the reindex task fails with an Illegal hex characters in escape (%) pattern error.

  • The system requires that the primary key values in an application table are unique. If primary key values are duplicated in a table sharding scenario, data will be overwritten. You can use the StringCatenateExtractor data source plug-in to merge multiple field values. The source fields are pk,$table, where pk is the primary key field of the PolarDB cluster table and $table is a default system variable that represents the corresponding database table name. The concatenation character is -, which can be customized.

For example, if the PolarDB cluster table is my_table_0 and the primary key field value is 123456, the new primary key value after concatenation is 123456-my_table_0.

  • To filter data based on a date or datetime field in the database table, you must use the complete date and time format in the filter conditions. For a field named createtime, a valid condition is createtime>'2018-03-01 00:00:00'. Using an abbreviated format like createtime>'2018-3-1 00:00:00' will cause an error.

Configure a PolarDB data source

  • You can configure a PolarDB data source when you create an application.

  • For an existing application, you can modify the data source by performing an offline change on the application details page.

Procedure

1. When creating or modifying an application, go to the Data Source step. Add or edit a data source, select PolarDB, and then click New Database.

2. After entering the PolarDB data source information, click Connect.

Parameter

Description

Cluster ID

The ID of the PolarDB cluster. You can obtain the ID from the PolarDB console. This parameter is case-sensitive. Example: pc-uf6c056ny9tiaj1l7

Database name

The name of the database to connect to within the instance. This parameter is not case-sensitive.

Username

The database account used to retrieve the table schema and full data. This parameter is case-sensitive.

Password

The password for the specified account.

OpenSearch attempts to connect and provides feedback based on the outcome:

Message

Actions

This PolarDB cluster does not exist in the current region for the current user.

Verify that the cluster ID is correct and ensure that the PolarDB cluster is in the same region as the OpenSearch application. If the issue persists, submit a ticket.

Failed to connect to the database service.

Verify that the PolarDB connection details, including the cluster ID, database name, username, and password, are correct.

This table does not exist in the current PolarDB cluster.

Verify that the table name is correct and confirm that the table exists in the PolarDB database.

Issue with PolarDB cluster configuration items.

Go to the Parameter Settings page in the PolarDB console, modify the relevant configuration items, and then retry.

3. After connecting to the PolarDB data source, select the data tables.

In the Data tables list on the left, select the checkbox for a desired table and click the >> button to add it to the Selected list on the right.

  • Select or enter the name of the table to access in the database. The name is case-sensitive.

  • Wildcard matching for sharded tables is supported, such as table_* to match table_a, table_b, and so on.

4. If the connection is successful, configure the fields. OpenSearch automatically retrieves the table fields. For information about data source plugins, see Use data processing plugins.

To add a data source plugin, click the + button in the Content conversion column for the corresponding field mapping. After finishing the field mapping, click OK.

5. Configure filter conditions for the PolarDB data source (not supported for Standard Edition), set the data synchronization method, and then click Finish to complete the application structure configuration.

The data synchronization method provides three options: Automatic synchronization, DTS data subscription instance, and No automatic synchronization.

  • An OpenSearch application can be configured with multiple data sources, but their table structures and configurations must be identical.

  • If the console indicates that automatic synchronization is not supported, use DTS real-time synchronization.

  • The filter conditions for a PolarDB data source ensure that only records that meet the specified criteria are synchronized. For detailed configuration information, see Filter conditions for a data source.