OpenSearch Industry Algorithm Edition provides built-in text analyzers for different languages, industries, and search scenarios. Each analyzer controls how field content is tokenized at index time and matched at query time.
Analyzer overview
The following table summarizes all available analyzers. "Dedicated only" means the analyzer requires a dedicated application instance.
| Analyzer | Language / domain | Supported field types | Dedicated only |
|---|---|---|---|
| N-gram | Any | SHORT_TEXT | Yes |
| Keyword | Any | LITERAL, INT, LITERAL_ARRAY, INT_ARRAY | No |
| General analyzer for Chinese | Chinese | TEXT, SHORT_TEXT | No |
| E-commerce analyzer for Chinese | Chinese / E-commerce | TEXT, SHORT_TEXT | No |
| Single-character analyzer for Chinese | Chinese | TEXT, SHORT_TEXT | No |
| Character analyzer for Chinese | Chinese | TEXT, SHORT_TEXT | Yes |
| Fuzzy analyzer | Chinese / Pinyin | SHORT_TEXT | No |
| Full pinyin spelling analyzer | Pinyin | SHORT_TEXT | No |
| Abbreviated pinyin analyzer | Pinyin | SHORT_TEXT | No |
| Word stemming analyzer for English | English | TEXT, SHORT_TEXT | No |
| Unstemmed word analyzer for English | English | TEXT, SHORT_TEXT | No |
| Fine-grained analyzer for English | English | TEXT, SHORT_TEXT | Yes |
| General analyzer for Thai | Thai | TEXT, SHORT_TEXT | Yes |
| E-commerce analyzer for Thai | Thai / E-commerce | TEXT, SHORT_TEXT | Yes |
| General analyzer for Vietnamese | Vietnamese | TEXT, SHORT_TEXT | Yes |
| General analyzer for Indonesian | Indonesian | TEXT, SHORT_TEXT | Yes |
| General analyzer for Korean | Korean | TEXT, SHORT_TEXT | Yes |
| E-commerce analyzer for Korean | Korean / E-commerce | TEXT, SHORT_TEXT | Yes |
| General analyzer for Japanese | Japanese | TEXT, SHORT_TEXT | Yes |
| E-commerce analyzer for Japanese | Japanese / E-commerce | TEXT, SHORT_TEXT | Yes |
| Simple analyzer | Any | TEXT, SHORT_TEXT | No |
| Numeric analyzer | Numeric | INT, TIMESTAMP | No |
| Geo-location analyzer | Geographic | geo_point | No |
| IT content analyzer | IT industry | TEXT, SHORT_TEXT | No |
| General E-commerce analysis | E-commerce | TEXT | Yes (E-commerce Enhanced) |
| General analysis for the gaming industry | Gaming | TEXT, SHORT_TEXT | Yes (Gaming Enhanced) |
| General analyzer for English E-commerce | English / E-commerce | TEXT | Yes (E-commerce Enhanced) |
| Custom text analyzer | Any | TEXT, SHORT_TEXT | No |
N-gram analyzer
Tokenizes text into sequences of N consecutive characters. Supports 2-gram and 3-gram tokenization. Use this analyzer for non-semantic search scenarios where you need character-level substring matching.
Available only for dedicated applications. The field type must be SHORT_TEXT.
Example — 2-gram
Input: Open Search
Tokens: op, pe, en, n , s, se, ea, ar, rc, ch
Example — 3-gram
Input: Open Search
Tokens: ope, pen, en , n s, se, sea, ear, arc, rch
Keyword analyzer
Does not tokenize the field value. The entire value is treated as a single token. Use this analyzer for exact match scenarios — tags, identifiers, string codes, and numeric values that must not be split.
Supported field types: LITERAL, INT, LITERAL_ARRAY, INT_ARRAY
Example
Input: chrysanthemum tea
The document is retrieved only when the query is exactly chrysanthemum tea.
General analyzer for Chinese
A general-purpose semantic analyzer for Chinese text. Tokenizes content into meaningful search units based on Chinese language semantics. Suitable for most industries.
Supported field types: TEXT, SHORT_TEXT
Example
Input: 菊花茶
The document is retrieved by queries 菊花茶, 菊花, 茶, or 花茶.
E-commerce analyzer for Chinese
A semantic analyzer optimized for Chinese e-commerce product names and descriptions. Produces finer-grained tokens than the general analyzer for common product terminology.
Supported field types: TEXT, SHORT_TEXT
Example
Input: Dabao SOD lotion
The document is retrieved by queries Dabao, sod, sod lotion, SOD lotion, or lotion.
Single-character analyzer for Chinese
Tokenizes Chinese text into individual characters and words. Use this analyzer for non-semantic Chinese search scenarios — author names, store names, or any field where individual character recall matters more than semantic grouping.
Supported field types: TEXT, SHORT_TEXT
This analyzer treats numbers and English words as single tokens. A search forhedoes not retrieve a document containinghello world. Use the fuzzy analyzer if you need partial word matching on numbers or English text.
If a search result summary is configured for a TEXT field, some extended tokens such as 花茶 are not highlighted.Example
Input: 菊花茶
The document is retrieved by queries 菊花茶, 菊花, 茶, 花茶, 花, or 菊茶.
Character analyzer for Chinese
Tokenizes text into individual Chinese characters, numbers, English letters, and punctuation marks. Suitable for non-semantic search scenarios that require maximum granularity.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: 开放搜索OpenSearch123.
The document is retrieved by searching for any single character: 开, 放, 搜, 索, O, p, e, n, S, e, a, r, c, h, or .
Fuzzy analyzer
Supports searches by pinyin, single characters, and letters, including prefix and suffix matching for numbers, letters, and pinyin. Chinese text does not support prefix or suffix matching. The field length is limited to 100 bytes.
Supported field types: SHORT_TEXT only
For details, see Fuzzy searches.
Example — Chinese and pinyin
Input: chrysanthemum tea
The document is retrieved by queries including chrysanthemum tea, chrysanthemum, tea, flower tea, flower, ju, juhua, juhuacha, j, jh, or jhc.
Example — Prefix and suffix matching
Input: 138****5678
Use
^138to match phone numbers starting with138.Use
5678$to match phone numbers ending with5678.
Example — English letter combinations
Input: OpenSearch
The document is retrieved by any single letter or combination of letters from the word.
Full pinyin spelling analyzer
Searches Chinese characters in SHORT_TEXT fields using their full pinyin spelling or the first letter of each syllable. Suitable for searches by full pinyin or abbreviated pinyin — for example, movie names or author names.
To search by full pinyin, enter the complete pinyin of the Chinese characters. Partial syllable spelling is not supported.
Supported field types: SHORT_TEXT only
Example
Input: Da Nei Mi Tan 007
The document is retrieved by d, dn, dnm, dnmt, dnmt007, da, danei, daneimi, or daneimitan.
The document is not retrieved by an or anei.
Abbreviated pinyin analyzer
Retrieves Chinese characters in SHORT_TEXT fields using the first letter of each pinyin syllable. Use this analyzer for scenarios that require searches by pinyin initials — people's names, movie titles, and similar short content.
Supported field types: SHORT_TEXT
Example
Input: Da Nei Mi Tan 007
The document is retrieved by d, dn, dnm, dnmt, dnmt0, dnmt007, m, mt, mt007, or 007.
Word stemming analyzer for English
An English semantic analyzer that stems each token to its root form and handles pluralization. Use this analyzer when you want searches like analyzing or analyzers to match documents containing analyze.
Supported field types: TEXT, SHORT_TEXT
This analyzer does not support query analysis configurations. Consecutive Chinese characters are treated as a single token.
Example
Input: 英文分词器 english analyzer
The document is retrieved by 英文分词器, english, analyz, analyzer, analyzers, analyze, analyzed, or analyzing.
Unstemmed word analyzer for English
Tokenizes English text based on spaces and punctuation marks without applying stemming. Use this analyzer for non-semantic English search scenarios — book titles, author names, or any field where exact word matching is required.
Supported field types: TEXT, SHORT_TEXT
This analyzer does not support query analysis configurations. Consecutive Chinese characters are treated as a single token.
Example
Input: 英文分词器 english analyzer
The document is retrieved by 英文分词器, english, or analyzer.
Fine-grained analyzer for English
Tokenizes English text based on semantics with finer granularity than the standard English analyzer. Suitable for general industry applications where compound words need to be split into their component terms.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: dataprocess
Tokenization result: data process
The document is retrieved by dataprocess, data process, data, or process.
General analyzer for Thai
A general-purpose analyzer that tokenizes Thai text into search units. Suitable for general industry applications.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: แหล่งดึงดูดนักท่องเที่ยว
Tokenization result: แหล่ง ดึง ดูด นักท่องเที่ยว
The document is retrieved by นักท่องเที่ยว or แหล่งดึงดูดนักท่องเที่ยว.
E-commerce analyzer for Thai
A semantic analyzer designed for Thai-language e-commerce scenarios.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: หน้าจอโทรศัพท์
Tokenization result: หน้าจอ โทรศัพท์
The document is retrieved by หน้าจอโทรศัพท์, หน้าจอ, or โทรศัพท์.
General analyzer for Vietnamese
A general-purpose analyzer for Vietnamese text in general industry applications.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
General analyzer for Indonesian
A general-purpose analyzer for Indonesian text in general industry applications.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
General analyzer for Korean
A general-purpose analyzer for Korean text in general industry applications.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: 인제군의교육
Tokenization result: 인제군 의 교육
The document is retrieved by 인제군의교육, 의, or 교육.
E-commerce analyzer for Korean
A semantic analyzer designed for Korean text in e-commerce scenarios.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: 스포츠캐주얼신발
Tokenization result: 스포츠 캐주얼 신발
The document is retrieved by 스포츠, 캐주얼, or 신발.
General analyzer for Japanese
A general-purpose analyzer for Japanese text in general industry applications.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: メキシコアグーチ
Tokenization result: メキシコ アグーチ
The document is retrieved by メキシコ or アグーチ.
E-commerce analyzer for Japanese
A semantic analyzer designed for Japanese text in e-commerce scenarios.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications.
Example
Input: ラウンドネックスーツ
Tokenization result: ラウンド ネック スーツ
The document is retrieved by ラウンド, ネック, or スーツ.
Simple analyzer
Gives you full control over tokenization. Use this analyzer for special scenarios where no built-in analyzer meets your requirements. When pushing documents or performing searches, use the tab character (\t) to separate tokens. Field content and search queries must be tokenized the same way — otherwise, documents cannot be retrieved.
Supported field types: TEXT, SHORT_TEXT
This analyzer does not support query analysis configurations.
Example
Field value: chrysanthemum\tflower tea\thao
The document is retrieved by chrysanthemum, flower tea, chrysanthemum\tflower tea, flower tea\thao, chrysanthemum\thao, or chrysanthemum\tflower tea\thao.
Numeric analyzer
Supports searches based on time intervals or numerical ranges. Use this analyzer on INT and TIMESTAMP fields where range queries are required.
Supported field types: INT, TIMESTAMP
Example
query=default:'OpenSearch' AND index:[number1,number2]In this example, index is the name of the index field configured with the numeric analyzer.
Geo-location analyzer
Supports geographic location range queries.
Supported field types: geo_point only
Example
query=spatial_index:'circle(116.5806 39.99624, 1000)'This query finds points within a circle to locate nearby places within a few kilometers.
IT content analyzer
An industry-specific analyzer designed for IT content. It tokenizes IT-related terms differently than the general analyzer — for example, handling programming language names and technical abbreviations with higher precision.
Supported field types: TEXT, SHORT_TEXT
Example
Original content: C++ array usage notes
General analysis: C++ array usage notes
IT content analysis: C++ array usage notes
General E-commerce analysis
An industry-specific analyzer for e-commerce, powered by the natural language processing (NLP) technology of Alibaba DAMO Academy. It resolves common pain points in e-commerce search, including product attribute parsing and brand recognition.
Supported field types: TEXT only
Available only for dedicated applications using the E-commerce Enhanced specification.
Example
Input: Small Gold Tube Concealer Cream
E-commerce analysis output: Small Gold Tube, Concealer, Cream
General analysis for the gaming industry
An industry-specific analyzer designed for gaming content.
Supported field types: TEXT, SHORT_TEXT
Available only for dedicated applications enhanced for the gaming industry.
Example
Input: Genshin equipment
Tokenization result: Genshin, equipment
The document is retrieved by Genshin equipment, Genshin, or equipment.
General analyzer for English E-commerce
A semantic analyzer for English text in e-commerce scenarios.
Supported field types: TEXT only
Available only for dedicated applications of the Industry-specific Enhanced Edition for E-commerce.
Custom text analyzer
Combines an industry-specific analyzer — such as a general analyzer, an e-commerce analyzer, or a person name analyzer — with custom intervention entries. Use this analyzer to extend built-in tokenization behavior with domain-specific vocabulary.
Supported field types: TEXT, SHORT_TEXT only
For configuration details, see Custom text analyzers.
Test analyzers
Test the tokenization results of industry-specific and custom analyzers in the OpenSearch console. Navigate to Search Algorithm Center > Retrieval Configuration > Analyzer Management, then click the Analysis Test tab.

Choose the right analyzer
Use the following guidance to select the right analyzer for your scenario.
| Scenario | Recommended analyzer |
|---|---|
| Semantic Chinese search (most industries) | General analyzer for Chinese |
| Non-semantic Chinese search or short text with high recall needs | Single-character analyzer for Chinese |
| Chinese search requiring maximum character granularity | Character analyzer for Chinese |
| Pinyin search (full or abbreviated) | Fuzzy analyzer |
| Search by pinyin initials only | Abbreviated pinyin analyzer |
| Search by full pinyin spelling | Full pinyin spelling analyzer |
| English semantic search with stemming | Word stemming analyzer for English |
| English exact-word search (titles, names) | Unstemmed word analyzer for English |
| English compound word splitting | Fine-grained analyzer for English |
| E-commerce product search (Chinese) | E-commerce analyzer for Chinese or General E-commerce analysis |
| IT industry content | IT content analyzer |
| Numeric or time range queries | Numeric analyzer |
| Geographic proximity queries | Geo-location analyzer |
| Tags, identifiers, or non-tokenized strings | Keyword analyzer |
| Character-level substring matching | N-gram analyzer |
| Custom tokenization logic | Simple analyzer |
Combining analyzers for better recall and precision
For Chinese search, combine the general analyzer with the single-character analyzer to retrieve documents containing individual characters while ranking exact phrases higher. For example:
query=title_index:'菊花茶' OR sws_title_index:'菊花茶'Pair this query with the sort expression text_relevance(title)×5+field_proximity(sws_title). This combination retrieves documents that contain the individual characters for 菊花茶 even when separated, while ranking documents with the exact phrase 菊花茶 higher.
Usage notes
Supported field types for index fields
| Supported | Unsupported |
|---|---|
| INT, INT_ARRAY, TEXT, SHORT_TEXT, LITERAL, LITERAL_ARRAY, TIMESTAMP, GEO_POINT | FLOAT, FLOAT_ARRAY, DOUBLE, DOUBLE_ARRAY |
Additional constraints:
The default primary key of the primary table in the application schema is set as an index field named
id. This configuration cannot be modified.If a search result summary is configured for a TEXT field, some extended search unit phrases (such as
花茶derived from菊花茶) are not highlighted in search results.The single-character Chinese analyzer treats numbers and English words as single tokens. A search for
hedoes not retrieve a document containinghello world. Use the fuzzy analyzer for partial word matching on numbers or English text.