Index schema

更新时间:
复制 MD 格式

An index schema defines the structure of an index table: which fields to store, how to index them, and how to compress the data. Each index table has one schema, and the schema determines how OpenSearch Retrieval Engine Edition stores and retrieves your data.

Key concepts

ConceptDescription
FieldA named, typed attribute of a document. Fields are the building blocks of every index table.
Inverted indexMaps words to the documents that contain them (word → Doc1, Doc2, ..., DocN). Use it for keyword search and full-text retrieval.
Forward indexMaps document IDs to their field values (DocID → term1, term2, ..., termN). Use it for sorting, filtering, and statistics.
Summary indexStores field values keyed by document ID for fast result display. Use it to render search results without re-fetching raw data. When compression is enabled in the schema, OpenSearch Retrieval Engine Edition uses zlib to compress the summary index and decompresses it when reading.

Forward index subtypes

SubtypeData quantity per fieldQuery performanceData updatable
Single-valueFixed (one value; STRING type values are variable-length)HighYes
Multi-valueVariable (multiple values)LowerNo

Supported field data types

The following types are supported for forward index fields:

TypeBit widthSigned
INT88-bit integerYes
UINT88-bit integerNo
INT1616-bit integerYes
UINT1616-bit integerNo
INTEGER32-bit integerYes
UINT3232-bit integerNo
INT6464-bit integerYes
UINT6464-bit integerNo
FLOAT32-bit floating-point
DOUBLE64-bit floating-point
STRINGString

Schema structure

A schema is a JSON document with the following top-level keys:

{
  "file_compress": [          // Compressor definitions referenced by other sections
    {
      "name": "file_compressor",
      "type": "zstd"
    },
    {
      "name": "no_compressor",
      "type": ""
    }
  ],
  "table_name": "test",       // Index table name
  "summarys": {               // Summary index configuration
    "summary_fields": [
      "id",
      "fb_boolean",
      "fb_datetime",
      "fb_string",
      "fb_decimal",
      "fb_bigint",
      "fb_text"
    ],
    "parameter": {
      "file_compressor": "zstd"   // Compressor applied to the summary index
    }
  },
  "indexs": [                 // Inverted index definitions
    {
      "index_name": "id",
      "index_type": "PRIMARYKEY64",
      "index_fields": "id",
      "has_primary_key_attribute": true,
      "is_primary_key_sorted": false
    },
    {
      "index_name": "fb_boolean",
      "index_type": "STRING",
      "index_fields": "fb_boolean",
      "file_compress": "file_compressor",   // Apply compression to this index field
      "format_version_id": 1
    },
    {
      "index_name": "fb_datetime",
      "index_type": "STRING",
      "index_fields": "fb_datetime",
      "file_compress": "file_compressor",
      "format_version_id": 1
    },
    {
      "index_name": "fb_string",
      "index_type": "STRING",
      "index_fields": "fb_string"
    },
    {
      "index_name": "fb_text",
      "index_type": "TEXT",
      "index_fields": "fb_text"
    }
  ],
  "attributes": [             // Forward index (attribute field) definitions
    {
      "field_name": "id",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_boolean",
      "file_compress": "file_compressor"
    },
    {
      "field_name": "fb_datetime",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_string",
      "file_compress": "file_compressor"
    },
    {
      "field_name": "fb_decimal",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_bigint",
      "file_compress": "no_compressor"
    }
  ],
  "fields": [                 // Field type definitions shared across all indexes
    {
      "user_defined_param": {},
      "field_name": "id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "fb_boolean",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "field_name": "fb_datetime",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "user_defined_param": {
        "multi_value_sep": ","    // Delimiter for multi-value fields
      },
      "field_name": "fb_string",
      "field_type": "STRING",
      "compress_type": "equal",
      "multi_value": true
    },
    {
      "field_name": "fb_decimal",
      "field_type": "DOUBLE"
    },
    {
      "field_name": "fb_bigint",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "fb_text",
      "field_type": "TEXT",
      "analyzer": "chn_standard"  // Text analyzer for full-text indexing
    }
  ]
}

Schema parameter reference

`fields` parameters

ParameterTypeDescriptionDefault
field_namestringField nameRequired
field_typestringData type: INT8, UINT8, INT16, UINT16, INTEGER, UINT32, INT64, UINT64, FLOAT, DOUBLE, STRING, TEXTRequired
compress_typestringField data compression: equal for single-value fields, uniq for multi-value or STRING fieldsNot compressed
multi_valueboolWhether the field holds multiple valuesfalse
user_defined_param.multi_value_sepstringDelimiter for multi-value fields. Must be a single character; full-width characters are not supported^]
analyzerstringText analyzer for TEXT fields (e.g., chn_standard). Required when field_type is TEXT

`indexs` parameters

ParameterTypeDescriptionDefault
index_namestringIndex nameRequired
index_typestringIndex type: PRIMARYKEY64, STRING, TEXTRequired
index_fieldsstringThe field this index is built onRequired
has_primary_key_attributeboolWhether the primary key has a corresponding attribute fieldfalse
is_primary_key_sortedboolWhether the primary key index is sortedfalse
file_compressstringCompressor name (references file_compress[].name). Cannot be set on the primary key indexNot compressed
format_version_idintegerIndex format version

`attributes` parameters

ParameterTypeDescriptionDefault
field_namestringThe field this attribute is built onRequired
file_compressstringCompressor name (references file_compress[].name)Not compressed
For more information about configuring an index table, see Configure an index table.

Add an index table

Prerequisites

Before you begin, ensure that you have:

  • An active OpenSearch Retrieval Engine Edition instance

  • At least one data source configured for the instance

Steps

  1. On the instance details page, choose Configuration Center > Index Schema in the left-side navigation pane, then click Create Index Table.

  2. Configure the Index Table, Data Source, and Data Shards parameters.

  3. Configure fields. Multi-value field delimiters: The default delimiter is ^]. Set a custom delimiter in user_defined_param.multi_value_sep. Delimiters must be a single character; full-width characters are not supported. Attribute field compression: By default, attribute fields are not compressed. To compress an attribute field, select file_compressor. Related topics:

    Important

    When compressing attribute fields, modify the index loading method to reduce the performance impact. On the instance details page, choose O&M Center > Deployment Management. Click the Searcher worker, then open the Searcher Worker Configurations panel and go to the Online Table Configurations tab.

    Field data compression:

    Field typeDefault compress_type
    Single-value fieldsequal
    Multi-value fields or STRINGuniq
    Not compressedLeave blank
  4. Configure indexes. Index field compression: By default, index fields are not compressed. To compress an index field, select file_compressor.

    Note

    - The primary key index cannot be compressed. - When compressing index fields, modify the index loading method. On the instance details page, choose O&M Center > Deployment Management. Click the Searcher worker, then open the Searcher Worker Configurations panel and go to the Online Table Configurations tab.

  5. Click Save Version. In the dialog, enter an optional description and click Publish.

  6. To view the updated topology, choose O&M Center > Deployment Management in the left-side navigation pane.

  7. To apply the new index table to the cluster, choose O&M Center > O&M Management, click Update Configurations, and set Trigger Reindexing to Push Configurations and Trigger Reindexing.

  8. To monitor reindexing progress, choose O&M Center > Change History and click the Data Source Changes tab.

After reindexing completes, the new index table is ready for queries.

Important
  • Only one primary key field is allowed per index table.

  • At least one field must have Search Result Display selected.

  • TEXT fields require an analysis method. Multi-value TEXT fields are not supported.

  • Only one primary key index is allowed per index table.

  • If your cluster has 2 replicas, set Data Shards to 2. The number of Searcher workers must be greater than (replicas x data shards); otherwise, the index table cannot be used.

  • A single data shard can hold up to 600 million documents, with a combined maximum of 2.1 billion across all shards. The index size of a single shard cannot exceed 300 GB. For real-time updates, the update transactions per second (TPS) per shard cannot exceed 4,000. Using the add command, the TPS can reach 10,000.

Modify an index table

Index table versions

Each new index table starts with two versions:

Version nameStatusDescription
index_config_v1In Use or UnusedThe initial configuration. Status is In Use after you push the configuration and rebuild indexes; Unused otherwise.
index_config_editModifyingThe version currently being edited.

Subsequent published versions are named incrementally: index_config_v2, index_config_v3, and so on. Add a description to each version to tell them apart.

Edit and publish a new version

  1. Find the version in Modifying state and click Modify.

    You can also switch to developer mode to edit the schema JSON directly. In cluster.json, the customized_merge_config and segment_customize_metrics_updater keys are available for index merging configuration. The segment_customize_metrics_updater key is only supported on new instances.
  2. Make your changes, then click Save Version.

  3. Find the version in Modifying state, click Publish, enter a description, and click OK. The system creates a new version in Unused state.

  4. To apply the changes to the cluster, choose O&M Center > O&M Management, click Update Configurations, and set Trigger Reindexing to Push Configurations and Trigger Reindexing.

Delete a version: You can delete versions in Unused state.

View a version: Click View to open the configuration page in read-only mode. Both administrator mode and developer mode are available for viewing.

Delete an index table

You can delete an index table that has no version in In Use state.

If the index table has a version in In Use state, unsubscribe from it first:

  1. Choose O&M Center > Deployment Management. Click the index table, then click Cancel Subscription on the Effective Online tab.

  2. Choose Configuration Center > Index Schema. Find the index table and click Delete in the Actions column.

Warning

After canceling a subscription on the Deployment Management page, delete the index table from the index schema. Leaving an unsubscribed index table in the schema may degrade query performance on your online clusters.

Usage notes

  • A data source is required when creating an index table. If no data source exists, add one before creating the index table.

  • The index table name cannot be changed after creation.

  • An index table with a version in In Use state cannot be deleted.

  • Each index table can have only one version in Modifying state at a time.

What's next