DLF data source-OpenSearch(Open Search)-阿里云帮助中心

This topic describes how to add a table to a Retrieval Engine Edition instance by using a Data Lake Formation (DLF) data source.

Prerequisites

You are familiar with Data Lake Formation.
You have configured the Data Lake Formation catalog ID, database, and table, which you will use to configure data synchronization.

Add a Data Lake Formation data source

On the Instance Details > Table Management page, click Add Table.
In the Basic Table Information step, enter a Table name (for example, dlf_table), set the Number of data shards (default: 1) and Number of resources for data updates (default: 2), and then click Next. The number of resources for data updates determines the concurrency for consuming real-time data to improve TPS.

Table name: You can specify a custom name.
Number of data shards: Specify a positive integer up to 256. The value should not exceed three times the number of data nodes in your instance.
Number of resources for data updates: By default, each table includes two free update resources (4 vCPUs and 8 GB of memory each). You are charged for resources that exceed the free quota. For more information, see Billing overview of Retrieval Engine Edition.

In the Data synchronization step, configure the data source. After validating the configuration, click Next.

For Full data source, select Data Lake Formation (DLF), and then enter the catalog ID, database, and table.

Full data source: The source for the initial full data import. Select Data Lake Formation (DLF).
Catalog ID: The ID of the Data Lake Formation catalog that contains your data.
Database: The database within the catalog that contains your table.
Table: The table within the database that you want to synchronize.
Note
- To use DLF as a data source for an existing instance, you must first upgrade its offline version.
- Currently, only Paimon-type catalogs are supported.
- Paimon tables with a primary key support add, update, delete, and query operations. Paimon append-only tables support only write operations and do not allow data to be modified or deleted.

Configure the index schema in either Form Mode or Developer Mode, and then click Next.

In the Field Settings section, define your data fields, such as an id field (Type: INT8, as primary key), a name field (Type: STRING), and an email field (Type: STRING). You can also set data compression and analyzers here. In the Index Settings section, define your search indexes. For example, add an index named phone (Type: STRING) based on the name field.
After you confirm the creation, the system automatically creates the table. You can monitor the creation progress on the Change history page. When the table status changes to In Use, you can use the Query test page to test your queries.

Precautions

When new data is written to a Paimon table in DLF, OpenSearch automatically triggers real-time indexing. However, manually writing data by using an API can introduce data inconsistencies, so exercise caution.