Text analyzers

更新时间:
复制 MD 格式

OpenSearch text analyzers control how field values are broken into tokens during indexing and how search queries are parsed. Choosing the right analyzer for each field directly determines which queries retrieve a document.

Analyzer overview

The following table summarizes all available analyzers. Use it to narrow down your options before reading the detailed reference sections.

AnalyzerSupported field typesBest forAvailability
Keyword analyzerLITERAL, ARRAY, INTExact-match searches; tags, IDs, whole strings
General analyzer for ChineseTEXT, SHORT_TEXTGeneral Chinese full-text search
E-commerce analyzer for ChineseTEXT, SHORT_TEXTChinese product catalog search
Single character analyzer for ChineseTEXT, SHORT_TEXTNon-semantic Chinese searches; author or store names
Fuzzy analyzerSHORT_TEXTPinyin search; prefix/suffix search; single-letter search
Word stemming analyzer for EnglishTEXT, SHORT_TEXTEnglish full-text search with stemming
Unstemmed word analyzer for EnglishTEXT, SHORT_TEXTNon-semantic English searches; book titles, author names
Analyzer for fine-grained analysis for EnglishTEXT, SHORT_TEXTEnglish full-text search with compound-word splittingExclusive applications
Full pinyin spelling analyzerSHORT_TEXTFull-pinyin or first-letter abbreviated pinyin search
Abbreviated pinyin spelling analyzerSHORT_TEXTFirst-letter abbreviated pinyin search
Simple analyzerTEXT, SHORT_TEXTCustom tokenization with tab-separated terms
Numerical value analyzerINT, TIMESTAMPTime-interval and numerical-range queries
Geo-location analyzerGEO_POINTGeographical-location queries
IT content analyzerTEXT, SHORT_TEXTTechnical content with IT-specific terminology (e.g., c++)
General analyzer for E-commerce for ChineseTEXTChinese E-commerce product search using NLPIndustry-specific Enhanced Edition for E-commerce
General analyzer for ThaiTEXT, SHORT_TEXTThai full-text searchExclusive applications
Analyzer for E-commerce for ThaiTEXT, SHORT_TEXTThai E-commerce product searchExclusive applications
General analyzer for VietnameseTEXT, SHORT_TEXTVietnamese full-text searchExclusive applications
General analyzer for GamingTEXT, SHORT_TEXTGaming industry content searchIndustry-specific Enhanced Edition for Gaming
General analyzer for E-commerce for EnglishTEXTEnglish E-commerce product searchIndustry-specific Enhanced Edition for E-commerce
Character analyzer for ChineseTEXT, SHORT_TEXTCharacter-level non-semantic Chinese searchExclusive applications
Custom analyzer for textTEXT, SHORT_TEXTScenarios where built-in analyzers cannot meet requirements

Analyzer reference

Keyword analyzer

Outputs the entire field value as a single token without any segmentation. Use it for exact-match searches on tags, keywords, IDs, or any string that must be treated as a whole.

Supported field types: LITERAL, ARRAY, INT

Example: If a field value is 菊花茶, the document is retrieved only when a user searches for 菊花茶.

General analyzer for Chinese

Segments Chinese text into search units based on Chinese semantics. This is the recommended starting point for most Chinese full-text search use cases.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is 菊花茶, the document is retrieved when a user searches for 菊花茶, 菊花, , or 花茶.

E-commerce analyzer for Chinese

An industry-specific analyzer tuned for Chinese E-commerce product search. It handles product names, brand names, and mixed Chinese-English terms that are common in retail catalogs.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is 大宝SOD, the document is retrieved when a user searches for 大宝, sod, sod, SOD, or .

Single character analyzer for Chinese

Segments Chinese text into individual characters as well as multi-character words. Use it when semantic meaning is not required and higher recall is preferred — for example, when searching author names or store names.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is 菊花茶, the document is retrieved when a user searches for 菊花茶, 菊花, , 花茶, , , or 菊茶.

This analyzer treats numbers and English words as single tokens. A search for he does not retrieve a document whose field contains hello. To support partial English-word matching, use the fuzzy analyzer instead.

Fuzzy analyzer

Supports pinyin, prefix/suffix, and single-word or single-letter searches. Fields must not exceed 100 bytes.

Supported field types: SHORT_TEXT

Chinese text does not support prefix or suffix searches. Prefix and suffix searches apply to letters, numbers, and pinyin only. For details, see Fuzzy search.

Examples:

  • Chinese with pinyin: If a field value is 菊花茶, the document is retrieved when a user searches for 菊花茶, 菊花, , 花茶, , , 菊茶, ju, juhua, juhuacha, j, jh, or jhc.

  • Prefix/suffix on numbers: If a field value is 138****5678, searching ^138 retrieves all numbers starting with 138; searching 5678$ retrieves all numbers ending with 5678.

  • Latin strings: If a field value is OpenSearch, the document is retrieved when a user searches for any single letter or combination of letters contained in the value.

Word stemming analyzer for English

Reduces each English word to its root form, enabling searches across inflected variants. Consecutive Chinese characters are treated as a single token.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is 英文分词器 english analyzer, the document is retrieved when a user searches for 英文分词器, english, analyz, analyzer, analyzers, analyze, analyzed, or analyzing.

Unstemmed word analyzer for English

Segments text on spaces and punctuation marks without applying stemming. Use it for non-semantic English searches such as book titles or author names. Consecutive Chinese characters are treated as a single token.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is 英文分词器 english analyzer, the document is retrieved when a user searches for 英文分词器, english, or analyzer.

Analyzer for fine-grained analysis for English

Segments English text by search unit and splits compound words into their components. This is a general-purpose English analyzer for industry-wide use cases.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is dataprocess, the analysis result is data process. The document is retrieved when a user searches for dataprocess, data process, data, or process.

Full pinyin spelling analyzer

Lets users find Chinese text by entering the full pinyin spelling or the first letters of each syllable (abbreviated pinyin). The user must type a complete pinyin syllable — partial syllables do not match.

Supported field types: SHORT_TEXT

Example: If a field value is 大内密探007, the document is retrieved when a user searches for d, dn, dnm, dnmt, dnmt007, da, danei, daneimi, or daneimitan. Searching for an or anei does not retrieve the document.

Abbreviated pinyin spelling analyzer

Lets users find Chinese text by entering the first letters of each pinyin syllable. Unlike the full pinyin spelling analyzer, partial syllable entries are supported.

Supported field types: SHORT_TEXT

Example: If a field value is 大内密探007, the document is retrieved when a user searches for d, dn, dnm, dnmt, dnmt0, damt007, m, mt, mt007, or 007.

Simple analyzer

Gives you full control over tokenization. Terms in field values and search queries must be separated by tab characters (\t). Field values and queries must use the same segmentation; otherwise, documents cannot be retrieved.

Supported field types: TEXT, SHORT_TEXT

Example: If a field value is 菊\t花茶\thao, the document is retrieved when a user searches for , 花茶, 菊\t花茶, 花茶\thao, 菊\thao, or 菊\t花茶\thao.

Numerical value analyzer

Indexes numeric and timestamp fields for range queries.

Supported field types: INT, TIMESTAMP

Example:

query=default:'开放搜索' AND index:[number1,number2]

In this example, index is the name of the index field configured with the numerical value analyzer.

Geo-location analyzer

Indexes GEO_POINT fields for geographical-location queries such as radius searches.

Supported field types: GEO_POINT

Example:

query=spatial_index:'circle(116.5806 39.99624, 1000)'

This query retrieves documents within a circle whose radius can be several kilometers.

IT content analyzer

An industry-specific analyzer for technical IT content. Compared with the general analyzer for Chinese, it handles IT-specific character sequences differently — for example, keeping c++ as a single token instead of splitting it.

Supported field types: TEXT, SHORT_TEXT

Example:

InputGeneral analyzer resultIT content analyzer result
c++数组使用注意事项c ++ 数组使用注意事项c++ 数组使用注意事项

General analyzer for E-commerce for Chinese

An industry-specific analyzer for Chinese E-commerce product search. It uses natural language processing (NLP) technology from DAMO Academy to produce finer-grained segmentation than the general analyzer.

Supported field types: TEXT

Availability: Industry-specific Enhanced Edition for E-commerce only.

Example:

InputGeneral analyzer resultGeneral E-commerce analyzer result
小金管遮瑕膏小金管遮瑕膏小金管 遮瑕

General analyzer for Thai

Segments Thai text into search units for general-purpose Thai full-text search.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is แหล่งดึงดูดนักท่องเที่ยว, the analysis result is แหล่ง ดึง ดูด นักท่องเที่ยว. The document is retrieved when a user searches for นักท่องเที่ยว or แหล่งดึงดูดนักท่องเที่ยว.

Analyzer for E-commerce for Thai

Segments Thai text for E-commerce product search scenarios.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is หน้าจอโทรศัพท์, the analysis result is น้าจอ โทรศัพท์. The document is retrieved when a user searches for หน้าจอโทรศัพท์, หน้าจอ, or โทรศัพท์.

General analyzer for Vietnamese

Segments Vietnamese text for general-purpose Vietnamese full-text search.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

General analyzer for Gaming

An industry-specific analyzer tuned for gaming content.

Supported field types: TEXT, SHORT_TEXT

Availability: Industry-specific Enhanced Edition for Gaming only.

Example: If a field value is 原神装备, the analysis result is 原神 装备. The document is retrieved when a user searches for 原神装备, 原神, or 装备.

General analyzer for E-commerce for English

An industry-specific analyzer for English E-commerce product search.

Supported field types: TEXT

Availability: Industry-specific Enhanced Edition for E-commerce only.

Character analyzer for Chinese

Segments text into individual Chinese characters, English letters, numbers, and punctuation marks. Use it for character-level non-semantic searches.

Supported field types: TEXT, SHORT_TEXT

Availability: Exclusive applications only.

Example: If a field value is 开放搜索OpenSearch123, the document is retrieved when a user searches for any individual character or letter: , , , , O, p, e, n, S, e, a, r, c, h, or ..

Custom analyzer for text

Combines an industry-specific analyzer — the general analyzer, E-commerce analyzer, or person name analyzer — with custom intervention entries. For configuration details, see Custom analyzers.

Supported field types: TEXT, SHORT_TEXT

Choose an analyzer

Use the following guidance to select the right analyzer for your use case.

Chinese full-text search

  • For most scenarios, start with the general analyzer for Chinese or the E-commerce analyzer for Chinese.

  • When stringent ranking is not required and higher recall matters — such as short text or non-semantic content — use the single character analyzer for Chinese.

Combining analyzers for better ranking

Index the same field with both a semantic and a character-level analyzer to retrieve more documents while ranking semantically relevant results higher. For example:

query=title_index:'菊花茶' OR sws_title_index:'菊花茶'

Fine sort expression:

text_relevance(title)*5+field_proximity(sws_title)

This configuration allows users to retrieve all documents that contain xxxxxxxx. In addition, documents that contain 菊花茶 are ranked first.

Pinyin search

Use the fuzzy analyzer for pinyin-based searches.

English full-text search

Use the word stemming analyzer for English to match inflected word forms.

Test an analyzer

To verify how an analyzer segments a specific input, use the Word Analysis Test tool in the OpenSearch console.

  1. Log on to the OpenSearch console.

  2. In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration.

  3. On the Basic Configuration page, click Analyzer Management in the left-side pane.

  4. On the Analyzer Management page, find the target analyzer and click Word Analysis Test in the Actions column.

image

Usage notes

  • Index field types: The following field types support index configuration: INT, INT_ARRAY, TEXT, SHORT_TEXT, LITERAL, LITERAL_ARRAY, TIMESTAMP, and GEO_POINT. The following types do not: FLOAT, FLOAT_ARRAY, DOUBLE, and DOUBLE_ARRAY.

  • Search result highlighting: For TEXT fields, terms that belong to extended search units — such as 花茶 derived from 菊花茶 — may not be wrapped in HTML highlighting tags in search result summaries.

  • Single character analyzer and English words: This analyzer treats numbers and English words as single tokens. A search for he does not retrieve a document whose field contains hello. To support partial English-word matching, use the fuzzy analyzer instead.

  • Primary key index field: The primary key of the primary table is automatically set as an index field named id. This field cannot be modified.