This page defines the key terms used in Alibaba Cloud Open Search Retrieval Engine Edition. Understanding these concepts helps you read configuration guides and API references without getting blocked by unfamiliar terminology.
Data-related terms
| Term | Description |
|---|---|
| MaxCompute data source | The source for full data. Raw data is stored in MaxCompute by partition and loaded during full indexing. |
| API data source | The source for incremental data. Data is updated by calling API operations in real time. |
| document | The basic unit of structured data in a search index. A document contains one or more fields and must have a primary key field. Think of a document as a row in a database table. Retrieval Engine Edition identifies documents by primary key value — if a new document shares the same primary key as an existing one, the existing document is overwritten. |
| field | A named attribute of a document, consisting of a field name and a field value. Think of a field as a column in a database table. |
| multi-value field | A field that holds multiple independent values — for example, a tags field containing ["cloud", "search", "analytics"]. |
| primary key | The field that uniquely identifies a document within an index. |
Retrieval Engine Edition terms
Online search roles
A cluster is a search service consisting of QRS workers and Searcher workers that work together to handle query requests.
| Role | Description |
|---|---|
| Query Result Searcher (QRS) worker | Handles online search. QRS workers parse incoming query requests, distribute them to Searcher workers, and merge the results before returning them to the caller. |
| Searcher worker | Handles online search. Searcher workers load index data into memory and serve search queries. |
Offline indexing roles
Processor, Builder, and Merger together form the offline indexing pipeline.
| Role | Description |
|---|---|
| Processor | Parses raw data during offline indexing. |
| Builder | Builds indexes from raw data during offline indexing. |
| Merger | Merges and sorts indexes during offline indexing. |
Indexing types
| Type | Description |
|---|---|
| Full indexing | Indexes all data in a MaxCompute data source. The output is a full index with full index versions. |
| Incremental indexing | When data is updated in real time, the offline indexing pipeline generates new indexes and applies them to online clusters automatically. |
| Real-time indexing | Data pushed via API operations takes effect immediately. Real-time indexes are generated in the memory of Searcher workers. |
Index types
Inverted index
An inverted index maps terms to the documents in which they appear. Inverted indexes power query clauses and make full-text search efficient.
For example, given two documents:
Document 1: "fast cloud search"
Document 2: "fast index builder"
The inverted index looks like this:
| Term | Documents |
|---|---|
| fast | 1, 2 |
| cloud | 1 |
| search | 1 |
| index | 2 |
| builder | 2 |
Forward index
A forward index maps documents to their fields. Forward indexes are used in FILTER clauses. They are less efficient than inverted indexes but support field-level lookups.
For example:
| Document | Fields |
|---|---|
| doc1 | id, type, create_time, ... |
| doc2 | id, type, create_time, ... |
Summary index
A summary index stores the field values displayed in search result summaries. Query it by primary key or document ID to retrieve the display content for a given result. Retrieval Engine Edition paginates search results using summary index data.
Tokenization
Tokenization splits document text into individual searchable units called terms.
For TEXT-type fields, the system tokenizes sentences into meaningful terms. For example, the Chinese string 浙江大学 is tokenized into two terms: 浙江 and 大学.
A term is a single token or a set of tokens produced after tokenization. Terms are the atomic units used in inverted index lookups.
Data changes triggered by FSM
The finite-state machine (FSM) coordinates system state transitions. Each FSM-triggered change has a type, a rule for whether it can run multiple times (recurring), and a description of what it does.
For the same resource scope (cluster, index, or zone), non-recurring changes can only run once per instance. Recurring changes follow their own concurrency rules as described below.
| Change type | Recurring | Description |
|---|---|---|
| Service discovery | Yes | Points the IP address of a Retrieval Engine Edition instance to a domain name, enabling service calls. For the same cluster, all historical changes are terminated before the latest change runs. |
| ha3_biz_apend | No | Adds a biz. Runs once per instance. Triggered automatically by the system. The change continues until the index table is added to the instance and the index is built. |
| update_biz_depend_index_fsm | No | Updates the index that a biz depends on. Runs once per instance. Triggered automatically by the system. The change continues until the index table is added and the index is built. |
| Online deployment | Yes | For the same cluster, all historical changes are terminated before the latest change runs. |
| multi_biz_activate | No | Initializes a Retrieval Engine Edition instance. Runs once per instance. The change continues until the index table is added and the index is built. |
| Index creation | Yes | For the same index, all historical changes are terminated before the latest change runs. |
| Automatically triggered full indexing | Yes | Triggered automatically when new data partitions are detected. The latest change and historical changes can run concurrently. |
| Manually triggered full indexing | Yes | The latest change and historical changes can run concurrently. |
| Configuration push | Yes | All historical changes are terminated before the latest change runs. |
| Online resources | Yes | For the same zone, all historical changes are terminated before the latest change runs. |
| Index rollback | Yes | The latest change and historical changes can run concurrently. |
FSM stands for finite-state machine — a mathematical model representing a finite set of states and the transitions between them.