Search index

更新时间:
复制 MD 格式

The Lindorm wide table engine supports search indexes (SearchIndex), a new index type for complex multi-dimensional query scenarios including tokenization, fuzzy queries, aggregate analysis, sorting, and paging.

Features

Search indexes use the same SQL interface as native secondary indexes. The following statement creates a search index named idx on columns c1, c2, c3, and c4 of table dt. Column order does not matter. Column c3 uses the IK tokenizer for full-text search.

CREATE INDEX idx USING SEARCH ON dt(c1, c2, c3(type=text, analyzer=ik),c4);
Important

Index building generates read operations. If your instance has enabled hot and cold data separation, be aware of throttling on cold storage (capacity-type cloud storage). Throttled cold-storage reads reduce index building efficiency and can cause write backpressure.

Search indexes support the following capabilities:

  • Multi-dimensional queries. Query data using any combination of indexed columns.

    SELECT * FROM dt WHERE c1=?;
    SELECT * FROM dt WHERE c2=? AND c4=?;
  • Tokenization-based queries. Use equality queries on tokenized columns to retrieve highly relevant result sets. For example, query c3 for records containing "Function Introduction", "function", or "introduction".

    SELECT * FROM dt WHERE MATCH (c3) AGAINST ('Function Introduction');
  • Aggregation. Supports COUNT, SUM, MIN, MAX, and AVG functions.

  • Sorting and paging. Supports ORDER BY on any indexed column.

Architecture

SearchIndex integrates the wide table engine with a search engine. The architecture consists of three independently scalable services: the wide table engine, Lindorm Tunnel Service (LTS), and the search engine. You can scale each service separately and select different machine types per service. This independent deployment model improves system stability.

image

Data write flow:

  1. Data is written to the wide table engine. The raw data is recorded in the write-ahead log (WAL) and the write result is returned to the client.

  2. LTS listens to the WAL in real time, filters tables with a SearchIndex, and writes the data to the search engine.

  3. The search engine builds an inverted index in real time.

Data query flow:

  1. A query is sent to the wide table engine. The computing layer compiles the query and selects the appropriate SearchIndex based on the optimizer.

  2. The query is routed to the search engine to retrieve matching data.

  3. Results are aggregated. If needed, additional data is retrieved from the wide table to complete the result set before returning it to the client.

Use cases

Search indexes combine high-concurrency, low-latency KV queries with multi-dimensional queries, tokenization-based queries, and aggregation. Consider search indexes for the following scenarios: