Query analysis: Named entity recognition

更新时间:
复制 MD 格式

270849046228.mp4

Named Entity Recognition (NER) identifies and categorizes meaningful spans in a user's search query—such as brands, locations, products, and people—so OpenSearch Industry Algorithm Edition can interpret semantic intent and return more relevant results. Instead of treating a query as a flat list of keywords, NER adds structured labels that improve both recall and ranking.

How it works

When a search query arrives, the query analysis pipeline runs NER before token matching. The NER model scans the raw query text, assigns each recognized span an entity category and a confidence score, and passes the structured output to downstream stages. Synonym configuration and relevance scoring then apply category-specific logic based on those labels.

For example, given the query "Nike running shoes Shanghai", NER identifies Nike as a Brand entity and Shanghai as a Location entity. The search engine can boost brand-matched results or filter by delivery region accordingly.

Supported entity categories

NER in OpenSearch Industry Algorithm Edition recognizes entities relevant to e-commerce and content search scenarios.

Entity categoryDescriptionExample
BrandManufacturer or brand nameNike, Apple
ProductSpecific product name or modeliPhone 15, Air Max 90
PersonName of an individualTom Hanks
LocationGeographic place or regionShanghai, California
OrganizationCompany, institution, or groupAlibaba Group
TimeDate, time, or time rangethis weekend, 2024
PriceMonetary value or rangeunder 500, ¥200
The entity categories available to your application depend on your OpenSearch Industry Algorithm Edition configuration and the semantic model deployed in your instance. Contact your account team to confirm which categories are enabled.

Enable Named Entity Recognition

NER is part of the query analysis pipeline. Enable and configure it through the OpenSearch console when setting up your application's query analysis settings.

Prerequisites

Before you begin, ensure that you have:

  • An OpenSearch Industry Algorithm Edition application

  • Query analysis enabled for your application

  • A semantic model deployed that supports NER

Enable NER in query analysis

  1. Log in to the OpenSearch console.

  2. Navigate to your application and open the Query Analysis settings.

  3. Enable Named Entity Recognition.

  4. Select the entity categories relevant to your use case.

  5. Set the minimum confidence threshold to filter low-confidence entity matches.

  6. Save and publish your configuration.

After publishing, NER applies to all new queries. Existing queries are not retroactively reprocessed.

Entity output structure

Each entity recognized in a query produces a structured result with the following fields.

FieldTypeDescription
textstringThe matched span from the original query
categorystringThe entity category (for example, Brand)
confidencefloatConfidence score between 0 and 1
offsetintegerStart position of the matched span in the query string
lengthintegerCharacter length of the matched span

Example: For the query "Nike running shoes", NER returns:

{
  "entities": [
    {
      "text": "Nike",
      "category": "Brand",
      "confidence": 0.97,
      "offset": 0,
      "length": 4
    }
  ]
}

Confidence scores and thresholds

The confidence score reflects how certain the model is about an entity match. A score close to 1.0 indicates high confidence; a score below 0.5 suggests the match may be ambiguous.

Set the minimum confidence threshold based on the tradeoff between precision and recall:

ThresholdEffect
High (for example, 0.8)Fewer entities recognized; higher precision, lower recall
Low (for example, 0.4)More entities recognized; higher recall, more noise

Start with a threshold of 0.6 and adjust based on search quality metrics from your application.

Limitations

  • NER performance depends on the quality and coverage of the deployed semantic model. Queries that contain rare terms, typos, or mixed-language input may produce lower confidence scores or no entity matches.

  • NER processes the query text as submitted. It does not correct spelling errors before entity detection.

  • Entity categories not included in your semantic model configuration are not recognized, even if the query contains matching text.

Next steps