Enhance AI retrieval with vector services from AI Search Open Platform-OpenSearch(Open Search)-阿里云帮助中心

AI Search Open Platform focuses on intelligent search scenarios and provides modular core algorithm services such as document splitting, vectorization, ranking, and large models. The Industry Algorithm Edition offers one-click access to these algorithm services, supports training custom vectorization models based on your business requirements, and lets developers call modular capabilities on demand to rapidly build intelligent search systems.

Workflow

Billing

Model invocation costs

The algorithm services provided by AI Search Open Platform use a pay-as-you-go model and are billed based on service usage. A bill is generated hourly. These hourly bills are consolidated into a single order, and Alibaba Cloud then deducts the charges from your account. You can view your Billing Details in the Billing and Cost Management console.

Important

Vectorization incurs costs during both data import and search query execution. This is because the Industry Algorithm Edition performs text vectorization on your business data and search queries based on the configured index fields.

Service ID	Description	Billing unit	Price for 0–500 units	Price for units exceeding 500
ops-text-embedding-001	Multilingual (40+) text vectorization service. Maximum input text length is 300, and the output vector dimension is 1,536.	CNY/1,000 tokens	0.005	0.0001
ops-text-embedding-002	Multilingual (100+) text vectorization service. Maximum input text length is 8,192, and the output vector dimension is 1,024.		0.005	0.0005
ops-text-embedding-zh-001	Chinese text vectorization service. Maximum input text length is 1,024, and the output vector dimension is 768.		0.005	0.00005
ops-text-embedding-en-001	English text vectorization service. Maximum input text length is 512, and the output vector dimension is 768.		0.005	0.00005
ops-document-split-001	This general-purpose service splits structured data in HTML, Markdown, and TXT formats based on document paragraph structure, text semantics, or specified rules. It also supports extracting code, images, and tables from rich text.		0.005	0.00002
ops-qwen-turbo	Built on the qwen-turbo large language model and fine-tuned with supervised learning to enhance search capabilities and reduce harmful content generation.		Input: 0.0004 Output: 0.0007
qwen-turbo	The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. For more information, see Select a model.		Input: 0.0003 Output: 0.0006
qwen-plus	A balanced model with performance, cost, and speed between Qwen-Max and Qwen-Turbo. Suitable for moderately complex tasks. For more information, see Select a model.		Input: 0.0008 Output: 0.002
qwen-max	The best-performing model in the Qwen series, suitable for complex, multi-step tasks. For more information, see Select a model.		Input: 0.0024 Output: 0.0096
deepseek-r1	A large language model focused on complex reasoning tasks, with outstanding performance in understanding complex instructions and ensuring result accuracy.		Input: 0.004 Output: 0.016
deepseek-v3	A Mixture of Experts (MoE) model that excels in long-text processing, code, mathematics, encyclopedic knowledge, and Chinese language capabilities.		Input: 0.002 Output: 0.008
deepseek-r1-distill-qwen-7b	A model fine-tuned on Qwen-7B using knowledge distillation, with training samples generated by DeepSeek-R1.		Input: 0.0005 Output: 0.001
deepseek-r1-distill-qwen-14b	A model fine-tuned on Qwen-14B using knowledge distillation, with training samples generated by DeepSeek-R1.		Input: 0.001 Output: 0.003
ops-bge-reranker-larger	Provides a document scoring service based on the BGE model. It reranks documents (docs) from high to low based on their relevance to a query and outputs the corresponding scores. Supports both Chinese and English, with a maximum input token length of 512 (Query + doc length).	CNY/doc	0.001	0.00003
ops-text-reranker-001	A proprietary OpenSearch reranker model trained on datasets from multiple industries. It provides a high-quality reranking service that sorts documents (docs) from high to low based on their semantic relevance to a query. Supports both Chinese and English, with a maximum input token length of 512 (Query + doc length).	CNY/doc	0.001	0.00015

Model training costs

Billing formula: CU price × CUs consumed per instance type × number of purchased instances

The specific billing rules are shown in the table below:

Instance type	CU price (CNY/hour)	CUs consumed per instance	Price per instance (CNY/hour)
gpu.v100.16g.x1	1.07	30.14	32.25
gpu.t4.16g.x1		16.07	17.195
gpu.a10.24g.x1		11.01	11.781

Usage example

Purchase an Industry Algorithm Edition instance.

Click One-Click Access to activate the AI Search Open Platform service.

Important

By using this feature, your business data will be transferred to and processed in the region where the product is deployed (China (Shanghai)). Depending on your configuration, this may involve cross-border data transfer. You must be aware of and comply with the following:

If cross-border data transfer is involved, by using this feature, you agree to ensure that all such transfers comply with applicable laws. This includes providing adequate data protection, delivering sufficient privacy notices, and obtaining necessary consent from individuals. You also guarantee that your business data does not contain any content whose transfer or disclosure is restricted or prohibited by applicable laws.
You hereby represent and warrant that you will comply with the preceding requirements. You agree to indemnify Alibaba Cloud and/or its affiliates for any losses resulting from your breach of these representations and warranties.

After you connect to the AI Search Open Platform service, you can view the following information related to service calls:

Description

Workspace: A workspace isolates and manages different business data. When you first activate the AI Search Open Platform service, a default workspace is automatically created. You can create more workspaces to isolate different services.

AI Search Open Platform allows a RAM user to use AI services with least privilege through a combination of workspaces and RAM user authorization, which improves data security. For more information, see Workspace Management and RAM User Authorization.

If the current RAM account has permission to use multiple workspaces, click Edit to switch workspaces.

Endpoint: AI Search Open Platform allows you to access services over the public network. Users in the China (Hangzhou), China (Shenzhen), China (Beijing), China (Zhangjiakou), and China (Qingdao) regions can also call AI Search Open Platform services across regions by using a VPC address.

Click Edit to configure whether the Industry Algorithm Edition instance calls services over the public network or through a VPC address. The default is VPC.

API key: An API key provides authentication information for service calls. To authorize a RAM user to use an API key from a specific workspace, you must grant the RAM user the relevant permissions.

If you have multiple API keys or if an API key expires, click Edit to switch to a different API key. The UI notifies you when an API key expires, and your application receives an error message when it attempts to use the search service.

After you activate the AI Search Open Platform service, you can also switch the region to China (Shanghai) to explore more services offered by the AI Search Open Platform.

Configure the application structure. This includes adding data source information for your search service, defining the primary table, configuring relationships between tables, and specifying primary keys and field mapping types.
When you configure the index structure for fields with the TEXT or SHORT_TEXT type, you can set the Analysis Method to use a vector analysis service. This service can be a built-in multi-dimensional text vectorization model from the AI Search Open Platform, a custom model that you deploy, or a built-in vectorization model from an Industry Algorithm Edition dedicated instance. This allows you to directly import raw text data when you import business data, and the service performs the data vectorization. For more information about billing, see Billing Rules.
Configure the data source. In this example, business data is stored in MaxCompute. The Industry Algorithm Edition allows you to specify data import partitions based on the characteristics of MaxCompute data. It supports using regular expressions to import data from the previous day to build the index. For more information on partition conditions, see Configure a MaxCompute data source.
After you complete these configurations, the system starts importing data and building the index. You can monitor the task progress on the Instance Details page.

When the application status is Normal, the index build is complete.
Configure the vector model used for search recall. Go to Search Algorithm Center > Recall Configuration > Query Analyzer Configuration. In the feature selection module, choose text vectorization and set the vector index value to the name of the index you configured in step 4. You can add multiple query analyzers to an application. For more information, see Query Analysis.

Go to Retrieval-Augmented Generation > Intelligent Conversation to configure the large model and its related parameters. Use the large model to summarize the search results.

The key parameters are described as follows:

Parameter	Description
Large model parameters
Model name	You can select a model from the Qwen series or DeepSeek series to summarize search results.
Prompt template	A prompt is an instruction you provide to the large model to clarify your request and guide the model to generate accurate and relevant content. For more information, see Prompt Management.
Model parameters	temperature: Controls the randomness and diversity of the generated content. The value must be in the range of [0, 2). A higher temperature value results in more diverse content, whereas a lower value makes the output more deterministic. top_p: Controls the sampling range of the candidate token set. The value must be in the range of (0, 1.0). A higher top_p value expands the range of candidate tokens, which leads to more diverse content. A lower value narrows the range, which makes the output more predictable. max_tokens: Controls the maximum number of tokens that the model can generate.
Context parameters
Number of input documents	The number of documents sent to the large model in a single request.
Input fields	Specify which fields from the document to input into the large model.
Field summary
Reranker model	Select a reranker model to sort the document splits: ops-bge-reranker-larger: A document scoring service based on the BGE model. It supports both Chinese and English, with a maximum input token length of 512. ops-text-reranker-001: An OpenSearch reranker model fine-tuned on industry-specific datasets. It supports both Chinese and English, with a maximum input token length of 512.
Splitting model	If a search result contains a long document, you can split it. This service splits structured data in HTML, Markdown, and TXT formats based on paragraph formatting, text semantics, or specified rules.
Number of splits to keep	Configure the number of retrieved document splits. The value must be in the range of [1, 30].
Maximum split length	The split length is measured in tokens. The value must be in the range of [100, 3000].

Use Feature Extensions > Search Test to test the effectiveness of vector retrieval and large model summarization.

Use the Query Analysis Process to confirm whether the vectorization service from AI Search Open Platform was used for the search query:

Enable Intelligent Conversation, set the query parameters, and select the intelligent conversation configuration from step 8 to test the results:
- fetch_fields: The fields to display. After you enable intelligent conversation, enter the fields configured for the feature, separated by semicolons (;).
- raw_query: The original query. Set this to the raw query string entered by the end user.

When the test results meet your business requirements, you can refer to the Developer Guide to call the search function from your application by using an SDK.

FAQ

Error 6614: AI platform text vectorization failed

Error 6614 indicates that the service call failed due to a configuration error with AI Search Open Platform. Follow these steps to check and resolve the issue:

Check whether the AI Search Open Platform service is active. If the service is inactive, you must manually activate it and test the search function again.
If the AI Search Open Platform service is active but you still receive error 6614 when calling the search function, take the following steps:
- Ensure the API key is enabled: Switch the region to China (Shanghai) and confirm in the AI Search Open Platform console that the API key is enabled. Alternatively, you can return to the Industry Algorithm Edition console and click Edit to switch to an enabled API key.
- Ensure the current RAM user has permission to use the workspace: In AI Search Open Platform, you can grant RAM users least privilege through a combination of workspaces and RAM user authorization, which enhances data security. You can follow the instructions in Workspace Management and RAM User Authorization to grant the necessary permissions.
- Ensure the endpoint is accessible.