Glossary

更新时间:
复制 MD 格式

This glossary defines the core concepts in OpenSearch Industry Algorithm Edition, organized by functional area.

Instance management

TermDescription
instanceThe top-level container for a search service. An instance holds all data configurations — data source schema, index schema, and data attributes — and serves as a single search service endpoint. An instance is analogous to a database in a relational database system.
documentThe basic unit of searchable data, analogous to a row in a relational database table. A document contains one or more fields and must have a primary key field. OpenSearch uses the primary key to uniquely identify each document. If you push a new document with the same primary key as an existing one, the existing document is overwritten.
fieldA single name–value pair within a document. Fields are the building blocks of a document and determine what data is stored and how it can be searched or filtered.
pluginsBuilt-in data processing plugins that OpenSearch provides to transform your data during import. Select plugins when you define the schema or configure a data source.
source dataThe raw data you push to OpenSearch before any processing. Source data contains one or more source fields.
source fieldThe smallest unit of source data — a single name–value pair. For supported data types, see Application schema and index schema.
indexA data structure that accelerates retrieval. An instance can have multiple indexes. OpenSearch uses two types of indexes internally: inverted indexes and forward indexes.
composite indexAn index built across multiple fields of TEXT or SHORT_TEXT type. For example, a forum search service might use a title_search index for title-only searches and a default composite index across both titles and bodies for comprehensive searches.
index fieldA field defined to participate in query clauses. Defining index fields is required for high-performance full-text retrieval.
attribute fieldA field used in FILTER, SORT, AGGREGATE, and DISTINCT clauses. Attribute fields enable filtering, sorting, and statistics on search results without participating in full-text retrieval.
default display fieldThe set of fields returned in search results by default. Override these defaults per request using the fetch_fields API parameter. When fetch_fields is set, the default display field configuration is ignored and only the specified fields are returned.
tokenizationThe process of splitting text field values into individual terms for indexing. How text is split depends on the field type: TEXT fields are split into meaningful word-level terms, while SHORT_TEXT fields are split character by character. For example, the Chinese phrase "浙江大学" becomes "浙江" and "大学" for a TEXT field, but "浙", "江", "大", and "学" for a SHORT_TEXT field. Without tokenization, only exact-string matches would work — tokenization is what makes full-text search possible.
termA single token produced by tokenization. Terms are used to build the inverted index.
index buildingThe process of constructing indexes from terms after tokenization. OpenSearch builds two types of indexes: inverted indexes (used for retrieval) and forward indexes (used for filtering).
inverted indexA data structure that maps each term to the documents containing it. Inverted indexes power query clause searches. For example, given two documents — "quick brown fox" and "quick fox jumps" — the inverted index maps: quick → doc1, doc2 / brown → doc1 / fox → doc1, doc2 / jumps → doc2.
forward indexA data structure that maps each document to its field values. Forward indexes power FILTER clause operations. They are less efficient for retrieval than inverted indexes but are necessary for operations that read field values per document — for example: doc1 → id, type, create_time.
retrievalThe process of finding documents that match a search request. OpenSearch converts query keywords into terms, then looks up the inverted index to find all matching documents.
retrieval amountThe number of documents that are retrieved.
Index fields, attribute fields, source fields, and default display fields serve distinct purposes. Index fields are used for full-text retrieval. Attribute fields are used for filtering, sorting, and aggregation. Source fields are the raw input fields from your data source. Default display fields control what is returned in search results. Understanding this distinction helps you design your schema correctly.

Data synchronization

TermDescription
data sourceThe external system from which data is pushed into OpenSearch. Supported sources are ApsaraDB for RDS, MaxCompute, and PolarDB.
reindexingThe process of rebuilding all indexes from scratch. Reindexing is required after you configure or modify the application schema and a data source.

Quota management

TermDescription
document capacityThe cumulative storage size of all documents in an instance, calculated by converting each field value to a string and summing the sizes.
QPSQueries per second (QPS) — the number of search requests an instance processes per second.
LCULogical computing unit (LCU) — the unit used to measure the computing power of a search service. One LCU indicates the computing power of 10 millicores in a search cluster. A millicore is one thousandth of a CPU core.
scalingAdjusting the compute and capacity configuration of an instance. Small specification changes take effect immediately. Changes that involve switching instance types — for example, from a shared instance to an exclusive instance — take effect only after approval.

Search

TermDescription
sort expressionA user-defined expression that controls the ranking of search results. Sort expressions support basic mathematical operations, mathematical functions, and built-in functions.
rough sort expressionA first-pass ranking expression. OpenSearch calculates a matching score for each retrieved document using this expression and sorts results by score. The top N results are then passed to the fine sort stage.
fine sort expressionA second-pass ranking expression applied to the top N results from rough sort. Fine sort expressions apply more precise scoring — at higher computational cost — to refine the final ranking.
search result summaryA short excerpt of a document's text content displayed alongside each search result, helping users judge relevance without reading the full document.
query analysisA set of pre-retrieval features applied to the raw search query. Supported features include synonyms expansion, spelling correction, stop words filtering, and term weight adjustment. These features improve search quality by interpreting user intent rather than matching keywords literally.