Product introduction

更新时间:
复制 MD 格式

Product introduction

The Qianxun Search Algorithm from Alibaba Cloud Tongyi Lab builds on DAMO Academy’s long-standing expertise in natural language processing (NLP). It focuses on enterprise-wide unified search and delivers precise, multi-source heterogeneous search. The service is offered as a Platform-as-a-Service (PaaS), providing APIs for offline data processing and search. It supports deployment across public cloud, Apsara Stack, hybrid cloud built on cloud-native infrastructure, and private environments.

Using natural language processing, machine learning, and enterprise knowledge bases, the product advances search from relevance to cognitive intelligence. It embeds semantics and domain knowledge into both the search process and results. This delivers efficient, highly accurate search—helping users find what they need, find all relevant items, and find the most precise matches. For enterprise customers, it supports interactive multi-turn conversation search, address book search, location search, and document search. For enterprises and Large Language Models (LLMs), it provides retrieval-augmented capabilities.

Benefits

Scenario-focused and easy to use

Building a full search pipeline from scratch is complex and time-consuming for developers and Independent Software Vendors (ISVs). Qianxun Search Algorithm simplifies this by offering guided, end-to-end search configuration and default algorithm support for core enterprise unified search scenarios.

Industry-leading algorithm performance

Fully proprietary multilingual query analysis—including tokenization, named entity recognition (NER), spelling correction, query rewriting, and classification. End-to-end algorithms integrate LLMs and are driven by performance metrics. Its Chinese multilingual embeddings achieve top-tier MRR@10 scores on retrieval benchmark datasets. Compared with pure vector search, multi-channel recall plus fine-grained ranking significantly improves both MRR@10 and Recall.

Flexible search engineering framework

Supports multiple data sources. Supports intelligent offline data processing—for text, documents, and more. Integrates with multiple search engines, such as Elasticsearch (ES). System components are modular—for example, search engine compatibility is configurable.

Secure, stable, and highly robust

The service runs reliably and offers technical support via online tickets. It includes comprehensive fault monitoring, automatic alerting, and rapid root-cause diagnosis. Access control and isolation are enforced at the API level using Alibaba Cloud AccessKey ID and AccessKey Secret pairs. This ensures strict user-level data isolation and strong data security.

Scenarios

Find people

Large enterprises have many employees across departments. Users can search precisely for people or departments by name or department. Results appear in cards and link directly to employee or department organizational charts—helping users quickly locate business contacts and improve cross-department collaboration.

Find content

Unifies fragmented content and business resources scattered across systems. Builds a comprehensive knowledge service system aligned with departmental needs. Delivers differentiated intelligent search for diverse users.

Find applications

Large enterprises run many business applications and navigation links. With unified search, users see only applications and links within their permission scope—and jump directly to them. This boosts employee self-service efficiency.

Find locations

Provides 21-level structured standard address data. Combines it with enterprise-specific address data to deliver a unified, standardized location search service. This service is particularly needed in sectors such as retail, energy, judicial, and public security.

Improve general-purpose search quality

A search enhancement service built on DAMO Academy’s NLP algorithms. Helps users rapidly build intelligent search over their own data. Supports text search, document search, address book search, location search, and more.

Intelligent customer service assistant

Uses enterprise-specific knowledge bases and conversational AI to answer questions in multi-turn dialogues. Answers general or company-specific questions quickly—replacing manual, multi-source searches and summaries. Improves operational efficiency.

Feature modules

Search enhancement

Overview

Search enhancement is a one-stop intelligent search PaaS built on a large-scale distributed search engine. It provides enterprise developers with foundational infrastructure, APIs, and search tools. It integrates fully proprietary multilingual query analysis—tokenization, NER, spelling correction, query rewriting, and classification—along with pretrained vector representations from multiple model architectures (encoder-only and decoder-only). It also supports hybrid recall and multi-factor ranking—combining text matching and deep semantic matching. Compared with pure vector search, it delivers industry-leading search quality.

Benefits

Benefit 1: Industry-leading chunk analysis and file parsing

Leverages Alibaba DAMO Academy’s proprietary Intelligent Document Processing (IDP) service. Splits data from various formats into chunks and adds basic text understanding.

Benefit 2: Industry-leading search enhancement algorithms

Fully proprietary multilingual query analysis. Pretrained vector representations from multiple model architectures. Hybrid recall and multi-factor ranking. Multi-channel recall plus fine-grained ranking improves MRR@10 by 28% and Recall by 21.6% versus pure vector search.

Scenarios

Enhances search capability and quality for large language models in broad enterprise search scenarios.

Multi-turn conversation search

Overview

Multi-turn conversation search combines search and large language models. You can build next-generation generative search applications using your own knowledge bases. Unlike traditional keyword-matching search engines, generative search uses conversational interaction to clarify user intent—and then tailors responses based on that intent, producing clear, precise answers.

Benefits

Benefit 1: Innovative conversational experience

Users express intent clearly through dialogue. Multi-turn, in-depth conversations meet complex information needs.

Benefit 2: Flexible intelligent search engine

Users configure indexes and choose from multiple recall and ranking algorithms. Semantics and knowledge are embedded into the search process—delivering fast, highly accurate results.

Benefit 3: Trustworthy answers

Built-in Qwen search-optimized LLM greatly improves factuality and reliability. Local knowledge bases further reduce hallucination.

Scenarios

Scenario 1: Intelligent customer service assistant

Integrates enterprise product information to handle user inquiries and issues—improving support efficiency and customer satisfaction.

Scenario 2: Natural language enterprise knowledge base

Integrates internal enterprise knowledge to help employees find information quickly—making it their go-to productivity tool and boosting work efficiency.

Qianxun Search Algorithm atomic capabilities

Overview

Capability 1: Multi-turn query rewriting

Rephrases raw user input to improve model understanding and search recall. Supports iterative rewriting across multiple turns.

Capability 2: Search intent detection

Determines whether a user’s original query requires a search task to answer.

Capability 3: General-purpose ranking model

You can sort data elements using an algorithm.

Benefits

Industry-leading search algorithms. Fully proprietary multilingual query analysis. Pretrained vector representations from multiple model architectures. Hybrid recall and multi-factor ranking. Multi-channel recall plus fine-grained ranking improves MRR@10 by 28% and Recall by 21.6% versus pure vector search.

Scenarios

Enhances search capability and quality for large language models in broad enterprise search scenarios.

How to call the product

image.png

Terms

Noun

Description

Scenarios

Search scenarios describe situations where search technology is used to find and retrieve information. These include internet search, e-commerce search, social media search, mobile apps, enterprise systems, and smart devices.

Search engine

A text search engine is software that retrieves relevant information from large volumes of text data. It finds documents or records matching a user’s search query or keywords—and returns them ranked by relevance.

Search strategy

A search strategy is a plan tailored to a specific scenario. It includes recall policies, ranking policies, and business logic filters.

Index

An index is a structured, labeled representation of large text datasets. During indexing, the search engine analyzes each document, extracts keywords and other key information, and stores them in an index structure—such as an inverted index, hash table, or B-tree. Indexes let the engine quickly locate documents containing query terms—greatly improving search speed and accuracy. Indexing is a critical step that directly affects query performance and result quality.

Index field

An index field is a specific data field extracted and stored during indexing—so queries can quickly locate related documents. For example, in email search, indexing the “sender” and “recipient” fields helps find specific messages. Field selection depends on the use case and goals—to maximize accuracy and efficiency. Well-designed index fields improve both engine performance and user experience.

Recall

Recall is the process of retrieving documents relevant to a user’s query from a large dataset. Algorithms or rules match keywords, titles, or content—and rank results using relevance, weight, or other signals—to return accurate, fast results.

Ranking

Ranking orders retrieved results by relevance. Algorithms, models, or rules score documents using relevance, weight, user feedback, and other signals. The goal is to surface the most useful results first. Common ranking factors include keyword match strength, document quality, and user preferences—enabling personalized results.

Data source

A data source is the origin of data used to build a private knowledge base for later retrieval and question answering.

Large Language Model (LLM)

A Large Language Model (LLM) is a language model trained on massive text corpora. By learning vast amounts of linguistic knowledge and context, it generates high-quality text and performs semantic understanding. LLMs excel at natural language processing tasks—including text generation, machine translation, and question answering. However, training and inference require significant compute resources—and depend heavily on data quality and diversity. LLMs are a leading research focus in natural language processing today.