OpenSearch text analyzers control how field values are broken into tokens during indexing and how search queries are parsed. Choosing the right analyzer for each field directly determines which queries retrieve a document.
Analyzer overview
The following table summarizes all available analyzers. Use it to narrow down your options before reading the detailed reference sections.
| Analyzer | Supported field types | Best for | Availability |
|---|---|---|---|
| Keyword analyzer | LITERAL, ARRAY, INT | Exact-match searches; tags, IDs, whole strings | — |
| General analyzer for Chinese | TEXT, SHORT_TEXT | General Chinese full-text search | — |
| E-commerce analyzer for Chinese | TEXT, SHORT_TEXT | Chinese product catalog search | — |
| Single character analyzer for Chinese | TEXT, SHORT_TEXT | Non-semantic Chinese searches; author or store names | — |
| Fuzzy analyzer | SHORT_TEXT | Pinyin search; prefix/suffix search; single-letter search | — |
| Word stemming analyzer for English | TEXT, SHORT_TEXT | English full-text search with stemming | — |
| Unstemmed word analyzer for English | TEXT, SHORT_TEXT | Non-semantic English searches; book titles, author names | — |
| Analyzer for fine-grained analysis for English | TEXT, SHORT_TEXT | English full-text search with compound-word splitting | Exclusive applications |
| Full pinyin spelling analyzer | SHORT_TEXT | Full-pinyin or first-letter abbreviated pinyin search | — |
| Abbreviated pinyin spelling analyzer | SHORT_TEXT | First-letter abbreviated pinyin search | — |
| Simple analyzer | TEXT, SHORT_TEXT | Custom tokenization with tab-separated terms | — |
| Numerical value analyzer | INT, TIMESTAMP | Time-interval and numerical-range queries | — |
| Geo-location analyzer | GEO_POINT | Geographical-location queries | — |
| IT content analyzer | TEXT, SHORT_TEXT | Technical content with IT-specific terminology (e.g., c++) | — |
| General analyzer for E-commerce for Chinese | TEXT | Chinese E-commerce product search using NLP | Industry-specific Enhanced Edition for E-commerce |
| General analyzer for Thai | TEXT, SHORT_TEXT | Thai full-text search | Exclusive applications |
| Analyzer for E-commerce for Thai | TEXT, SHORT_TEXT | Thai E-commerce product search | Exclusive applications |
| General analyzer for Vietnamese | TEXT, SHORT_TEXT | Vietnamese full-text search | Exclusive applications |
| General analyzer for Gaming | TEXT, SHORT_TEXT | Gaming industry content search | Industry-specific Enhanced Edition for Gaming |
| General analyzer for E-commerce for English | TEXT | English E-commerce product search | Industry-specific Enhanced Edition for E-commerce |
| Character analyzer for Chinese | TEXT, SHORT_TEXT | Character-level non-semantic Chinese search | Exclusive applications |
| Custom analyzer for text | TEXT, SHORT_TEXT | Scenarios where built-in analyzers cannot meet requirements | — |
Analyzer reference
Keyword analyzer
Outputs the entire field value as a single token without any segmentation. Use it for exact-match searches on tags, keywords, IDs, or any string that must be treated as a whole.
Supported field types: LITERAL, ARRAY, INT
Example: If a field value is 菊花茶, the document is retrieved only when a user searches for 菊花茶.
General analyzer for Chinese
Segments Chinese text into search units based on Chinese semantics. This is the recommended starting point for most Chinese full-text search use cases.
Supported field types: TEXT, SHORT_TEXT
Example: If a field value is 菊花茶, the document is retrieved when a user searches for 菊花茶, 菊花, 茶, or 花茶.
E-commerce analyzer for Chinese
An industry-specific analyzer tuned for Chinese E-commerce product search. It handles product names, brand names, and mixed Chinese-English terms that are common in retail catalogs.
Supported field types: TEXT, SHORT_TEXT
Example: If a field value is 大宝SOD蜜, the document is retrieved when a user searches for 大宝, sod, sod蜜, SOD蜜, or 蜜.
Single character analyzer for Chinese
Segments Chinese text into individual characters as well as multi-character words. Use it when semantic meaning is not required and higher recall is preferred — for example, when searching author names or store names.
Supported field types: TEXT, SHORT_TEXT
Example: If a field value is 菊花茶, the document is retrieved when a user searches for 菊花茶, 菊花, 茶, 花茶, 菊, 花, or 菊茶.
This analyzer treats numbers and English words as single tokens. A search forhedoes not retrieve a document whose field containshello. To support partial English-word matching, use the fuzzy analyzer instead.
Fuzzy analyzer
Supports pinyin, prefix/suffix, and single-word or single-letter searches. Fields must not exceed 100 bytes.
Supported field types: SHORT_TEXT
Chinese text does not support prefix or suffix searches. Prefix and suffix searches apply to letters, numbers, and pinyin only. For details, see Fuzzy search.
Examples:
Chinese with pinyin: If a field value is
菊花茶, the document is retrieved when a user searches for菊花茶,菊花,茶,花茶,菊,花,菊茶,ju,juhua,juhuacha,j,jh, orjhc.Prefix/suffix on numbers: If a field value is
138****5678, searching^138retrieves all numbers starting with138; searching5678$retrieves all numbers ending with5678.Latin strings: If a field value is
OpenSearch, the document is retrieved when a user searches for any single letter or combination of letters contained in the value.
Word stemming analyzer for English
Reduces each English word to its root form, enabling searches across inflected variants. Consecutive Chinese characters are treated as a single token.
Supported field types: TEXT, SHORT_TEXT
Example: If a field value is 英文分词器 english analyzer, the document is retrieved when a user searches for 英文分词器, english, analyz, analyzer, analyzers, analyze, analyzed, or analyzing.
Unstemmed word analyzer for English
Segments text on spaces and punctuation marks without applying stemming. Use it for non-semantic English searches such as book titles or author names. Consecutive Chinese characters are treated as a single token.
Supported field types: TEXT, SHORT_TEXT
Example: If a field value is 英文分词器 english analyzer, the document is retrieved when a user searches for 英文分词器, english, or analyzer.
Analyzer for fine-grained analysis for English
Segments English text by search unit and splits compound words into their components. This is a general-purpose English analyzer for industry-wide use cases.
Supported field types: TEXT, SHORT_TEXT
Availability: Exclusive applications only.
Example: If a field value is dataprocess, the analysis result is data process. The document is retrieved when a user searches for dataprocess, data process, data, or process.
Full pinyin spelling analyzer
Lets users find Chinese text by entering the full pinyin spelling or the first letters of each syllable (abbreviated pinyin). The user must type a complete pinyin syllable — partial syllables do not match.
Supported field types: SHORT_TEXT
Example: If a field value is 大内密探007, the document is retrieved when a user searches for d, dn, dnm, dnmt, dnmt007, da, danei, daneimi, or daneimitan. Searching for an or anei does not retrieve the document.
Abbreviated pinyin spelling analyzer
Lets users find Chinese text by entering the first letters of each pinyin syllable. Unlike the full pinyin spelling analyzer, partial syllable entries are supported.
Supported field types: SHORT_TEXT
Example: If a field value is 大内密探007, the document is retrieved when a user searches for d, dn, dnm, dnmt, dnmt0, damt007, m, mt, mt007, or 007.
Simple analyzer
Gives you full control over tokenization. Terms in field values and search queries must be separated by tab characters (\t). Field values and queries must use the same segmentation; otherwise, documents cannot be retrieved.
Supported field types: TEXT, SHORT_TEXT
Example: If a field value is 菊\t花茶\thao, the document is retrieved when a user searches for 菊, 花茶, 菊\t花茶, 花茶\thao, 菊\thao, or 菊\t花茶\thao.
Numerical value analyzer
Indexes numeric and timestamp fields for range queries.
Supported field types: INT, TIMESTAMP
Example:
query=default:'开放搜索' AND index:[number1,number2]In this example, index is the name of the index field configured with the numerical value analyzer.
Geo-location analyzer
Indexes GEO_POINT fields for geographical-location queries such as radius searches.
Supported field types: GEO_POINT
Example:
query=spatial_index:'circle(116.5806 39.99624, 1000)'This query retrieves documents within a circle whose radius can be several kilometers.
IT content analyzer
An industry-specific analyzer for technical IT content. Compared with the general analyzer for Chinese, it handles IT-specific character sequences differently — for example, keeping c++ as a single token instead of splitting it.
Supported field types: TEXT, SHORT_TEXT
Example:
| Input | General analyzer result | IT content analyzer result |
|---|---|---|
c++数组使用注意事项 | c ++ 数组使用注意事项 | c++ 数组使用注意事项 |
General analyzer for E-commerce for Chinese
An industry-specific analyzer for Chinese E-commerce product search. It uses natural language processing (NLP) technology from DAMO Academy to produce finer-grained segmentation than the general analyzer.
Supported field types: TEXT
Availability: Industry-specific Enhanced Edition for E-commerce only.
Example:
| Input | General analyzer result | General E-commerce analyzer result |
|---|---|---|
小金管遮瑕膏 | 小金管遮瑕膏 | 小金管 遮瑕 膏 |
General analyzer for Thai
Segments Thai text into search units for general-purpose Thai full-text search.
Supported field types: TEXT, SHORT_TEXT
Availability: Exclusive applications only.
Example: If a field value is แหล่งดึงดูดนักท่องเที่ยว, the analysis result is แหล่ง ดึง ดูด นักท่องเที่ยว. The document is retrieved when a user searches for นักท่องเที่ยว or แหล่งดึงดูดนักท่องเที่ยว.
Analyzer for E-commerce for Thai
Segments Thai text for E-commerce product search scenarios.
Supported field types: TEXT, SHORT_TEXT
Availability: Exclusive applications only.
Example: If a field value is หน้าจอโทรศัพท์, the analysis result is น้าจอ โทรศัพท์. The document is retrieved when a user searches for หน้าจอโทรศัพท์, หน้าจอ, or โทรศัพท์.
General analyzer for Vietnamese
Segments Vietnamese text for general-purpose Vietnamese full-text search.
Supported field types: TEXT, SHORT_TEXT
Availability: Exclusive applications only.
General analyzer for Gaming
An industry-specific analyzer tuned for gaming content.
Supported field types: TEXT, SHORT_TEXT
Availability: Industry-specific Enhanced Edition for Gaming only.
Example: If a field value is 原神装备, the analysis result is 原神 装备. The document is retrieved when a user searches for 原神装备, 原神, or 装备.
General analyzer for E-commerce for English
An industry-specific analyzer for English E-commerce product search.
Supported field types: TEXT
Availability: Industry-specific Enhanced Edition for E-commerce only.
Character analyzer for Chinese
Segments text into individual Chinese characters, English letters, numbers, and punctuation marks. Use it for character-level non-semantic searches.
Supported field types: TEXT, SHORT_TEXT
Availability: Exclusive applications only.
Example: If a field value is 开放搜索OpenSearch123, the document is retrieved when a user searches for any individual character or letter: 开, 放, 搜, 索, O, p, e, n, S, e, a, r, c, h, or ..
Custom analyzer for text
Combines an industry-specific analyzer — the general analyzer, E-commerce analyzer, or person name analyzer — with custom intervention entries. For configuration details, see Custom analyzers.
Supported field types: TEXT, SHORT_TEXT
Choose an analyzer
Use the following guidance to select the right analyzer for your use case.
Chinese full-text search
For most scenarios, start with the general analyzer for Chinese or the E-commerce analyzer for Chinese.
When stringent ranking is not required and higher recall matters — such as short text or non-semantic content — use the single character analyzer for Chinese.
Combining analyzers for better ranking
Index the same field with both a semantic and a character-level analyzer to retrieve more documents while ranking semantically relevant results higher. For example:
query=title_index:'菊花茶' OR sws_title_index:'菊花茶'Fine sort expression:
text_relevance(title)*5+field_proximity(sws_title)This configuration allows users to retrieve all documents that contain xx菊xx花xx茶xx. In addition, documents that contain 菊花茶 are ranked first.
Pinyin search
Use the fuzzy analyzer for pinyin-based searches.
English full-text search
Use the word stemming analyzer for English to match inflected word forms.
Test an analyzer
To verify how an analyzer segments a specific input, use the Word Analysis Test tool in the OpenSearch console.
Log on to the OpenSearch console.
In the left-side navigation pane, choose Search Algorithm Center > Retrieval Configuration.
On the Basic Configuration page, click Analyzer Management in the left-side pane.
On the Analyzer Management page, find the target analyzer and click Word Analysis Test in the Actions column.

Usage notes
Index field types: The following field types support index configuration: INT, INT_ARRAY, TEXT, SHORT_TEXT, LITERAL, LITERAL_ARRAY, TIMESTAMP, and GEO_POINT. The following types do not: FLOAT, FLOAT_ARRAY, DOUBLE, and DOUBLE_ARRAY.
Search result highlighting: For TEXT fields, terms that belong to extended search units — such as
花茶derived from菊花茶— may not be wrapped in HTML highlighting tags in search result summaries.Single character analyzer and English words: This analyzer treats numbers and English words as single tokens. A search for
hedoes not retrieve a document whose field containshello. To support partial English-word matching, use the fuzzy analyzer instead.Primary key index field: The primary key of the primary table is automatically set as an index field named
id. This field cannot be modified.