| Term | Description |
|---|
| instance | The top-level container for a search service. An instance holds all data configurations — data source schema, index schema, and data attributes — and serves as a single search service endpoint. An instance is analogous to a database in a relational database system. |
| document | The basic unit of searchable data, analogous to a row in a relational database table. A document contains one or more fields and must have a primary key field. OpenSearch uses the primary key to uniquely identify each document. If you push a new document with the same primary key as an existing one, the existing document is overwritten. |
| field | A single name–value pair within a document. Fields are the building blocks of a document and determine what data is stored and how it can be searched or filtered. |
| plugins | Built-in data processing plugins that OpenSearch provides to transform your data during import. Select plugins when you define the schema or configure a data source. |
| source data | The raw data you push to OpenSearch before any processing. Source data contains one or more source fields. |
| source field | The smallest unit of source data — a single name–value pair. For supported data types, see Application schema and index schema. |
| index | A data structure that accelerates retrieval. An instance can have multiple indexes. OpenSearch uses two types of indexes internally: inverted indexes and forward indexes. |
| composite index | An index built across multiple fields of TEXT or SHORT_TEXT type. For example, a forum search service might use a title_search index for title-only searches and a default composite index across both titles and bodies for comprehensive searches. |
| index field | A field defined to participate in query clauses. Defining index fields is required for high-performance full-text retrieval. |
| attribute field | A field used in FILTER, SORT, AGGREGATE, and DISTINCT clauses. Attribute fields enable filtering, sorting, and statistics on search results without participating in full-text retrieval. |
| default display field | The set of fields returned in search results by default. Override these defaults per request using the fetch_fields API parameter. When fetch_fields is set, the default display field configuration is ignored and only the specified fields are returned. |
| tokenization | The process of splitting text field values into individual terms for indexing. How text is split depends on the field type: TEXT fields are split into meaningful word-level terms, while SHORT_TEXT fields are split character by character. For example, the Chinese phrase "浙江大学" becomes "浙江" and "大学" for a TEXT field, but "浙", "江", "大", and "学" for a SHORT_TEXT field. Without tokenization, only exact-string matches would work — tokenization is what makes full-text search possible. |
| term | A single token produced by tokenization. Terms are used to build the inverted index. |
| index building | The process of constructing indexes from terms after tokenization. OpenSearch builds two types of indexes: inverted indexes (used for retrieval) and forward indexes (used for filtering). |
| inverted index | A data structure that maps each term to the documents containing it. Inverted indexes power query clause searches. For example, given two documents — "quick brown fox" and "quick fox jumps" — the inverted index maps: quick → doc1, doc2 / brown → doc1 / fox → doc1, doc2 / jumps → doc2. |
| forward index | A data structure that maps each document to its field values. Forward indexes power FILTER clause operations. They are less efficient for retrieval than inverted indexes but are necessary for operations that read field values per document — for example: doc1 → id, type, create_time. |
| retrieval | The process of finding documents that match a search request. OpenSearch converts query keywords into terms, then looks up the inverted index to find all matching documents. |
| retrieval amount | The number of documents that are retrieved. |