Data source plugin overview

更新时间:
复制 MD 格式

When you synchronize data from a data source to OpenSearch, field values in the source may need transformation before indexing — for example, splitting a comma-separated string into an array, or stripping HTML tags. Data processing plug-ins handle these transformations automatically as data syncs, so you don't need to pre-process data outside of OpenSearch.

Important

Data processing plug-ins are only available when synchronizing data through a configured data source. If you upload data via the API operation or OpenSearch SDKs, process the data before uploading.

Prerequisites

Before you begin, ensure that you have:

  • A configured data source for your OpenSearch application

  • Field mappings defined between your source tables and OpenSearch tables

Note

Configure plug-ins when setting up the data source, not when defining the application schema. Plug-ins are only available after a data source is configured.

Data source constraints

Note these constraints before configuring plug-ins:

  • ApsaraDB RDS and PolarDB: An OpenSearch table can be associated with multiple source tables (supports database and table sharding).

  • MaxCompute: An OpenSearch table can be associated with only one source table. To synchronize from multiple MaxCompute source tables, join them into a single table first.

Available plug-ins

OpenSearch provides five plug-ins for field transformation during synchronization:

Plug-inTransformation typeWhat it does
JsonKeyValueExtractorJSON extractionExtracts a specified key's value from a JSON-formatted source field
MultiValueSpliterValue splittingSplits a source field into multiple values using a delimiter
KeyValueExtractorKey-value extractionExtracts keys and values from key-value pair source fields
StringCatenateExtractorString concatenationConcatenates values from multiple fields into a single string
HTMLTagRemoverHTML strippingRemoves HTML tags from a source field value

JsonKeyValueExtractor

Extracts the value of a specified key from a JSON-formatted source field and maps it to the destination field. Only the value of the specified key can be extracted.

Type requirement: The extracted value type must match the destination field type. If the types are mismatched, the extracted value is silently dropped.

Array conversion: If the extracted value is a JSON array, it is automatically converted to an Array type field value.

Example

Source field value:

{"title": "the content", "body": "the content"}

To extract the title key, configure the plug-in to target title. The destination field receives "the content".

For Array types:

  • LITERAL_ARRAY source: {"tags": ["a", "b", "c"]}

  • INT_ARRAY source: {"tags": [1, 2, 3]}

MultiValueSpliter

Splits a source field value into multiple values using a specified delimiter, and writes the results to an Array type destination field.

Type requirement: The destination field must be of Array type.

Delimiter support:

Delimiter typeHow to specify
Common non-printable characters (e.g., \t)Write directly
Uncommon non-printable charactersUse Unicode notation (e.g., \u001D)
Multi-character delimiters (e.g., ##, \t\t)Write directly

Example

Source field value: 1,2,3

Specify , as the delimiter.

For more configuration details, see MultiValueSpliter configuration.

KeyValueExtractor

Extracts specified keys and values from a source field formatted as key-value pairs, and maps the results to the destination field. Only the values of the specified key can be extracted. Delimiters are not required.

Type requirement: The extracted value type must match the destination field type. If the types are mismatched, the value is silently dropped. If a delimiter separates extracted values, the destination field must be of Array type.

Duplicate key behavior: If two identical keys exist, only the value of the second key is extracted.

Example

Source field value: key1:value1,value2;key2:value3

Configuration:

  • Key-value pairs are separated by semicolons (;): separates key1:value1,value2 from key2:value3

  • Keys and values are separated by colons (:): separates key from value

  • Values are separated by commas (,): separates multiple values for a key

StringCatenateExtractor

Concatenates values from multiple destination table fields into a single string in a specified order.

Type requirement: This plug-in cannot concatenate fields of the INT type. We recommend that you use fields of the LITERAL type.

Field source: Fields must come from the destination table, not the source table. Separate multiple field names with commas (,).

System variable: Use $table to include the current table name in the concatenated string. $table is only populated when a table-sharding wildcard is configured.

Example

To concatenate field1 and field2 with an underscore (_) separator.

HTMLTagRemover

Strips all HTML tags from a source field value. The destination field receives the plain text content.

Example

Source field value: <div id="copyright">OpenSearch</div>

After processing, the destination field value is: OpenSearch

Limitations

ConstraintDetail
API and SDK uploadsPlug-ins are not available. Process data before uploading.
MaxCompute data sourceOne OpenSearch table maps to one MaxCompute source table only
StringCatenateExtractorCannot concatenate INT type fields
JsonKeyValueExtractor and KeyValueExtractorType mismatch between the extracted value and destination field causes silent data loss
Plug-in configuration timingConfigure only after a data source is set up; not available during schema definition

Related topics