Glossary

This page defines the key terms used in Alibaba Cloud Open Search Retrieval Engine Edition. Understanding these concepts helps you read configuration guides and API references without getting blocked by unfamiliar terminology.

Data-related terms

Term	Description
MaxCompute data source	The source for full data. Raw data is stored in MaxCompute by partition and loaded during full indexing.
API data source	The source for incremental data. Data is updated by calling API operations in real time.
document	The basic unit of structured data in a search index. A document contains one or more fields and must have a primary key field. Think of a document as a row in a database table. Retrieval Engine Edition identifies documents by primary key value — if a new document shares the same primary key as an existing one, the existing document is overwritten.
field	A named attribute of a document, consisting of a field name and a field value. Think of a field as a column in a database table.
multi-value field	A field that holds multiple independent values — for example, a `tags` field containing `["cloud", "search", "analytics"]`.
primary key	The field that uniquely identifies a document within an index.

Retrieval Engine Edition terms

Online search roles

A cluster is a search service consisting of QRS workers and Searcher workers that work together to handle query requests.

Role	Description
Query Result Searcher (QRS) worker	Handles online search. QRS workers parse incoming query requests, distribute them to Searcher workers, and merge the results before returning them to the caller.
Searcher worker	Handles online search. Searcher workers load index data into memory and serve search queries.

Offline indexing roles

Processor, Builder, and Merger together form the offline indexing pipeline.

Role	Description
Processor	Parses raw data during offline indexing.
Builder	Builds indexes from raw data during offline indexing.
Merger	Merges and sorts indexes during offline indexing.

Indexing types

Type	Description
Full indexing	Indexes all data in a MaxCompute data source. The output is a full index with full index versions.
Incremental indexing	When data is updated in real time, the offline indexing pipeline generates new indexes and applies them to online clusters automatically.
Real-time indexing	Data pushed via API operations takes effect immediately. Real-time indexes are generated in the memory of Searcher workers.

Index types

Inverted index

An inverted index maps terms to the documents in which they appear. Inverted indexes power query clauses and make full-text search efficient.

For example, given two documents:

Document 1: "fast cloud search"
Document 2: "fast index builder"

The inverted index looks like this:

Term	Documents
fast	1, 2
cloud	1
search	1
index	2
builder	2

Forward index

A forward index maps documents to their fields. Forward indexes are used in FILTER clauses. They are less efficient than inverted indexes but support field-level lookups.

For example:

Document	Fields
doc1	id, type, create_time, ...
doc2	id, type, create_time, ...

Summary index

A summary index stores the field values displayed in search result summaries. Query it by primary key or document ID to retrieve the display content for a given result. Retrieval Engine Edition paginates search results using summary index data.

Tokenization

Tokenization splits document text into individual searchable units called terms.

For TEXT-type fields, the system tokenizes sentences into meaningful terms. For example, the string Zhejiang University is tokenized into two terms: Zhejiang and University.

A term is a single token or a set of tokens produced after tokenization. Terms are the atomic units used in inverted index lookups.

Data changes triggered by FSM

The finite-state machine (FSM) coordinates system state transitions. Each FSM-triggered change has a type, a rule for whether it can run multiple times (recurring), and a description of what it does.

For the same resource scope (cluster, index, or zone), non-recurring changes can only run once per instance. Recurring changes follow their own concurrency rules as described below.

Change type	Recurring	Description
Service discovery	Yes	Points the IP address of a Retrieval Engine Edition instance to a domain name, enabling service calls. For the same cluster, all historical changes are terminated before the latest change runs.
ha3_biz_apend	No	Adds a biz. Runs once per instance. Triggered automatically by the system. The change continues until the index table is added to the instance and the index is built.
update_biz_depend_index_fsm	No	Updates the index that a biz depends on. Runs once per instance. Triggered automatically by the system. The change continues until the index table is added and the index is built.
Online deployment	Yes	For the same cluster, all historical changes are terminated before the latest change runs.
multi_biz_activate	No	Initializes a Retrieval Engine Edition instance. Runs once per instance. The change continues until the index table is added and the index is built.
Index creation	Yes	For the same index, all historical changes are terminated before the latest change runs.
Automatically triggered full indexing	Yes	Triggered automatically when new data partitions are detected. The latest change and historical changes can run concurrently.
Manually triggered full indexing	Yes	The latest change and historical changes can run concurrently.
Configuration push	Yes	All historical changes are terminated before the latest change runs.
Online resources	Yes	For the same zone, all historical changes are terminated before the latest change runs.
Index rollback	Yes	The latest change and historical changes can run concurrently.

FSM stands for finite-state machine — a mathematical model representing a finite set of states and the transitions between them.