Service overview and experience

更新时间:
复制 MD 格式

The Service Marketplace aggregates all services on the Open Platform for AI Search, allowing you to view service details without logging in. You can use the Experience Center to try out core capabilities, such as document parsing, multimodal embedding, sorting, object detection, text vectorization, and video analysis, to quickly determine if they meet your business needs.

Service overview

Agent search services

Text and document processing services

Service category

Description

Jina AI Reader

A web content extraction service for LLMs that converts any URL into a clean, LLM-friendly plain text format. It removes ads, navigation, and other distracting information to extract only the core content of a webpage.

document parsing

  • document parsing service 001: This service parses unstructured documents (including text, tables, and images) to extract logical structures such as headings and sections, outputting them in a structured format.

  • document parsing service 002: Parses various unstructured document formats such as PDF and images. It excels at recognizing complex elements like tables, formulas, and charts, and offers fast inference speeds.

text vectorization

  • OpenSearch text vectorization service-001: A text vectorization service for over 40 languages. The maximum input text length is 300 tokens, and the output vector dimension is 1536.

  • OpenSearch universal text vectorization service-002: A text vectorization service for over 100 languages. The maximum input text length is 8,192 tokens, and the output vector dimension is 1024.

  • OpenSearch text vectorization service-Chinese-001: A text vectorization service for Chinese. The maximum input text length is 1,024 tokens, and the output vector dimension is 768.

  • OpenSearch text vectorization service-English-001: A text vectorization service for English. The maximum input text length is 512 tokens, and the output vector dimension is 768.

  • GTE text vector-multilingual-Base: A text vectorization service for over 70 languages. The maximum input text length is 8,192 tokens, and the output vector dimension is 768.

  • Qwen3 text vector-0.6B: A Qwen3 series text vectorization service for over 100 languages. The maximum input length is 32k tokens, the output vector dimension is 1,024, and the model has 0.6 billion parameters.

sparse text vectorization

OpenSearch sparse text vectorization service: Converts text into a sparse vector representation. Sparse vectors require less storage and are often used to represent keywords and term frequency signals. They can be combined with a dense vector for hybrid search to improve retrieval quality. This service supports over 100 languages with a maximum input text length of 8,192 tokens.

document chunking

This service splits structured data in HTML, Markdown, and TXT formats based on document paragraphs, text semantics, or specified rules. It also supports extracting code, images, and tables from documents in rich text format.

dimensionality reduction

embedding-dim-reduction: A vector model fine-tuning service. You can use custom training to reduce high-dimensional vectors to lower dimensions. This reduces storage costs and improves cost-effectiveness with minimal impact on retrieval performance.

Multimodal processing services

Service category

Description

multimodal embedding

  • M2-Encoder-multimodal embedding model: A bilingual (Chinese and English) multimodal service trained on 6 billion image-text pairs (3 billion Chinese and 3 billion English) based on BM-6B. The model supports cross-modal retrieval of images and text, including text-to-image and image-to-text search, as well as image classification tasks.

  • M2-Encoder-Large-multimodal embedding model: This model has the same architecture as M2-Encoder but with its parameter count increased to 1 billion, providing stronger representational capabilities and better performance in multimodal tasks.

  • GME multimodal embedding-Qwen2-VL-2B: A multimodal embedding service trained on Qwen2-VL multimodal large language models (MLLMs). It supports single-modal and combined multimodal inputs, enabling efficient processing of text, images, and combined data types.

  • multimodal embedding-ops-mm-embedding-v1-2b: Developed by the Alibaba Cloud OpenSearch AI team, this service is fine-tuned on Qwen2-VL 2B Instruct. It supports single-modal and combined multimodal inputs, including text, image, and video, and is suitable for cross-modal retrieval and understanding tasks.

  • multimodal embedding-ops-mm-embedding-v1-7b: This service has the same capabilities as the 2B version but is fine-tuned on Qwen2-VL 7B Instruct. With a larger number of parameters, it offers stronger model representational capabilities.

  • E-commerce multimodal embedding-ops-mm-embedding-ecom-001: A custom multimodal embedding model for e-commerce, developed by the Alibaba Cloud OpenSearch AI team. It supports image-to-image and image-to-image-with-text search, making it suitable for cross-modal retrieval and understanding in e-commerce scenarios.

  • Face multimodal embedding-ops-mm-embedding-face-001: A service for face retrieval tasks. It uses an advanced face embedding model that supports single or multiple image inputs, encoding facial information into high-dimensional semantic vectors to enable efficient and accurate face search and comparison.

Multimodal ranking

Provides an image relevance ranking service. In RAG and multimodal search scenarios, it can rerank retrieval results by relevance, improving retrieval accuracy and LLM generation quality.

video summarization

Provides a video summarization service that understands a specified video segment and uses LLM capabilities to generate a video summary, title, and tags.

video splitting

This service understands and analyzes a video, extracts keyframes, and splits the video into corresponding segments.

keyframe extraction

keyframe extraction service 001: Provides video content extraction by capturing keyframes from a video. When combined with multimodal embedding or image parsing services, it enables cross-modal retrieval.

speech recognition

speech recognition service 001: This service provides speech-to-text capabilities, quickly converting speech from video or audio files into structured text. The service supports multiple languages.

object detection

  • object detection service-ops-object-detect-001: Automatically locates and identifies the main target or object in an image or video. It supports single-object and multi-object detection, suitable for applications like intelligent surveillance, autonomous driving, and image retrieval.

  • Face detection service-ops-object-detect-face-001: Automatically locates and recognizes faces in an image. It supports the detection of multiple faces and is suitable for applications like intelligent surveillance and image retrieval.

image content parsing

Image content understanding service: Uses a multimodal large language model to parse, understand, and recognize text in images. The parsed text can be used in image search and question-answering scenarios.

Image text recognition service: Performs Optical Character Recognition (OCR) on images. The recognized text can be used in image search and question-answering scenarios.

Features

The Experience Center offers the following services:

Service group

Service category

Description

Agent search services

Agentic Memory

Provides long-term, short-term, and contextual memory storage and retrieval for Agentic AI and intelligent search services. It supports storage, query, update, and forgetting operations for various search data, including personalized memory, agent memory, search memory, and Skill management. The service uses hybrid search and multi-path recall technology, combining BM25 and vector retrieval to ensure efficient and accurate memory retrieval.

The Agentic Memory service provides storage and management for two types of data: Memory and Skill. Memory stores a user's personal hobbies, interests, and preferences, while Skill stores reusable execution logic and capabilities.

large language model (LLM)

  • Qwen3-235B-A22B: A new generation of the Qwen series of LLMs. Based on extensive training, Qwen3 has made breakthroughs in reasoning, instruction following, agent capabilities, and multilingual support. It supports over 100 languages and dialects, with powerful multilingual understanding, reasoning, and generation capabilities.

  • OpenSearch-Qwen-Turbo: Built on the Qwen-Turbo LLM, this model is fine-tuned with supervised learning to enhance retrieval-augmented generation (RAG) and reduce harmful content.

  • Qwen-Turbo: The fastest and most cost-effective model in the Qwen series, suitable for simple tasks. For more information, see Select a model.

  • Qwen-Plus: A balanced model with reasoning performance, cost, and speed between those of Qwen-Max and Qwen-Turbo. It is suitable for moderately complex tasks. For more information, see Select a model.

  • Qwen-Max: The best-performing model in the Qwen series, suitable for complex, multi-step tasks. For more information, see Select a model.

  • DeepSeek-R1: An LLM that specializes in complex reasoning tasks. It performs well in understanding complex instructions and ensuring result accuracy.

  • DeepSeek-V3: A Mixture of Experts (MoE) model that excels in long text, code, mathematics, encyclopedic knowledge, and Chinese language capabilities.

  • DeepSeek-R1-distill-qwen-7b: A model fine-tuned on Qwen-7B using knowledge distillation with training samples generated by DeepSeek-R1.

  • DeepSeek-R1-distill-qwen-14b: A model fine-tuned on Qwen-14B using knowledge distillation with training samples generated by DeepSeek-R1.

  • DeepSeek-V4-Pro: A flagship MoE large model with 1.6 trillion total parameters and 49 billion activated parameters, natively supporting an ultra-long context of millions of tokens. Trained on massive, high-quality data, it excels in mathematical logic, complex reasoning, professional coding, and deep long-text analysis, making it suitable for high-level research, complex office tasks, and advanced intelligent agent scenarios.

  • DeepSeek-V4-Flash: An efficient and lightweight MoE model with 284 billion total parameters and 13 billion activated parameters, natively supporting an ultra-long context of millions of tokens. It features fast inference, low latency, and low cost. With balanced overall capabilities, it is designed for high-concurrency, lightweight tasks such as daily conversation, content creation, basic RAG, and batch text processing.

internet search

If your private knowledge base provides no answer during a search, you can enable internet search to retrieve additional information. This supplements your private knowledge base and, when combined with an LLM, enables richer responses.

query analysis

A query content analysis service that uses LLMs and NLP capabilities to perform intent recognition, similar query expansion, and natural language to SQL (NL2SQL) conversion on user queries. This improves retrieval and question answering performance in RAG scenarios.

A general query analysis service that uses an LLM for intent understanding and similar query expansion on user input queries.

sorting service

  • BGE reranking model: A document scoring service based on the BGE model. It sorts documents in descending order based on the relevance score between the query and the document content and outputs the corresponding scores. This service supports both Chinese and English, with a maximum input length of 512 tokens (query + document length).

  • OpenSearch self-developed reranking model: Trained on datasets from multiple industries, this model provides a high-level reranking service. It sorts documents based on the semantic relevance between the query and the document, from highest to lowest. This service supports both Chinese and English, with a maximum input length of 512 tokens (query + document length).

  • Qwen3 sorting service-0.6B: A Qwen3 series document reranking service. It supports over 100 languages, with a maximum input length of 32k tokens (query + document length) and 0.6 billion parameters.

Text and document processing services

Jina AI Reader

A web content extraction service for LLMs that converts any URL into a clean, LLM-friendly plain text format. It removes ads, navigation, and other distracting information to extract only the core content of a webpage.

document parsing

  • document parsing service 001: This service parses unstructured documents (including text, tables, and images) to extract logical structures such as headings and sections, outputting them in a structured format.

  • document parsing service 002: Parses various unstructured document formats such as PDF and images. It excels at recognizing complex elements like tables, formulas, and charts, and offers fast inference speeds.

text vectorization

  • OpenSearch text vectorization service-001: A text vectorization service for over 40 languages. The maximum input text length is 300 tokens, and the output vector dimension is 1536.

  • OpenSearch universal text vectorization service-002: A text vectorization service for over 100 languages. The maximum input text length is 8,192 tokens, and the output vector dimension is 1024.

  • OpenSearch text vectorization service-Chinese-001: A text vectorization service for Chinese. The maximum input text length is 1,024 tokens, and the output vector dimension is 768.

  • OpenSearch text vectorization service-English-001: A text vectorization service for English. The maximum input text length is 512 tokens, and the output vector dimension is 768.

  • GTE text vector-multilingual-Base: A text vectorization service for over 70 languages. The maximum input text length is 8,192 tokens, and the output vector dimension is 768.

  • Qwen3 text vector-0.6B: A Qwen3 series text vectorization service for over 100 languages. The maximum input length is 32k tokens, the output vector dimension is 1,024, and the model has 0.6 billion parameters.

sparse text vectorization

This service converts text data into a sparse vector representation. Sparse vectors require less storage and are often used to represent keywords and term frequency signals. They can be combined with dense vectors for hybrid search to improve retrieval quality.

OpenSearch sparse text vectorization service: A text vectorization service for over 100 languages with a maximum input text length of 8,192 tokens.

document chunking

This service splits structured data in HTML, Markdown, and TXT formats based on document paragraphs, text semantics, or specified rules. It also supports extracting code, images, and tables from documents in rich text format.

dimensionality reduction

embedding-dim-reduction: A vector model fine-tuning service. You can use custom training to reduce high-dimensional vectors to lower dimensions, helping improve cost-effectiveness with minimal impact on retrieval performance.

Multimodal processing services

multimodal embedding

  • M2-Encoder-multimodal embedding model: A bilingual (Chinese and English) multimodal service trained on 6 billion image-text pairs (3 billion Chinese and 3 billion English) based on BM-6B. The model supports cross-modal retrieval of images and text, including text-to-image and image-to-text search, as well as image classification tasks.

  • M2-Encoder-Large-multimodal embedding model: A bilingual (Chinese and English) multimodal service. Compared to the M2-Encoder model, it has a larger model size of 1 billion parameters, providing stronger representational capabilities and better performance in multimodal tasks.

  • GME multimodal embedding-Qwen2-VL-2B: A multimodal embedding service trained on Qwen2-VL multimodal large language models (MLLMs). It supports single-modal and combined multimodal inputs, enabling efficient processing of text, images, and combined data types.

  • multimodal embedding-ops-mm-embedding-v1-2b: A multimodal embedding model developed by the Alibaba Cloud OpenSearch AI team. Fine-tuned on Qwen2-VL 2B Instruct, it supports single-modal and combined multimodal inputs, including text, image, and video, encoding them into semantic vectors for cross-modal retrieval and understanding tasks.

  • multimodal embedding-ops-mm-embedding-v1-7b: A multimodal embedding model developed by the Alibaba Cloud OpenSearch AI team. Fine-tuned on Qwen2-VL 7B Instruct, it supports single-modal and combined multimodal inputs, including text, image, and video, encoding them into semantic vectors for cross-modal retrieval and understanding tasks.

  • E-commerce multimodal embedding-ops-mm-embedding-ecom-001: A custom multimodal embedding model for e-commerce, developed by the Alibaba Cloud OpenSearch AI team. It supports image-to-image and image-to-image-with-text search, making it suitable for cross-modal retrieval and understanding in e-commerce scenarios.

  • Face multimodal embedding-ops-mm-embedding-face-001: A service for face retrieval tasks. It uses an advanced face embedding model that supports single or multiple image inputs, encoding facial information into high-dimensional semantic vectors to enable efficient and accurate face search and comparison.

Multimodal ranking

Provides an image relevance ranking service. In RAG and multimodal search scenarios, this ranking service can rerank content to improve retrieval accuracy and LLM generation quality.

video summarization

Provides a video summarization service that understands a specified video segment and uses LLM capabilities to generate a video summary, title, and tags.

video splitting

This service understands and analyzes a video, extracts keyframes, and splits the video into corresponding segments.

keyframe extraction

keyframe extraction service 001: Provides video content extraction by capturing keyframes from a video. When combined with multimodal embedding or image parsing services, it enables cross-modal retrieval.

speech recognition

speech recognition service 001: This service provides speech-to-text capabilities, quickly converting speech from video or audio files into structured text. The service supports multiple languages.

object detection

  • object detection service-ops-object-detect-001: Automatically locates and identifies the main target or object in an image or video. It supports single-object and multi-object detection, suitable for applications like intelligent surveillance, autonomous driving, and image retrieval.

  • Face detection service-ops-object-detect-face-001: Automatically locates and recognizes faces in an image. It supports the detection of multiple faces and is suitable for applications like intelligent surveillance and image retrieval.

image content parsing

Image content understanding service: Uses a multimodal large language model to parse, understand, and recognize text in images. The parsed text can be used in image search and question-answering scenarios.

Image text recognition service: Performs Optical Character Recognition (OCR) on images. The recognized text can be used in image search and question-answering scenarios.

Try services

This section shows how to use the Experience Center to try services like document parsing and multimodal embedding, view results, and get the sample code.

Document parsing

  1. Log on to the Open Platform for AI Search console.

  2. In the navigation pane, select Experience Center.

  3. For Service Category, select Document/Image Parsing (document-analyze), and then select a specific service from Experience Services.

  4. Use the system-provided Sample data or upload your own data via Manage data. Supported file formats are TXT, PDF, HTML, DOC, DOCX, PPT, and PPTX. The maximum file size is 20 MB.

    • File: Upload local files. These files are automatically deleted after 7 days. The platform does not store your data long-term.

    • URL: Provide the file URL and its corresponding file type. You can upload multiple URLs, with each URL on a separate line.

      Note

      Document parsing will fail if you select an incorrect data format. Make sure to choose the correct file type for your data.

      Important

      Ensure that you use the web link import feature in compliance with applicable laws and regulations. You must adhere to the management specifications of the target platform and protect the legal rights of rights holders. You are solely responsible for your actions. As a tool provider, the Open Platform for AI Search is not liable for your parsing or downloading behavior.

  5. If you use your own data, select the pre-uploaded file or URL from the drop-down list.

  6. Click Get Results to start parsing the document.

    • Results: Displays the parsing progress and result.

    • Result source code: View the response code. You can use Copy Code or Download File to save the code locally.

    • Sample code: View and download the Sample code for calling this service.

Multimodal embedding

  1. Log on to the Open Platform for AI Search console.

  2. In the navigation pane, select Experience Center.

  3. For Service Category, select Multimodal Vector (multi-modal-embedding). Select a specific service from Experience Services, and then choose Text, Image, or Text + Image.

    Note

    Uploaded local images for vectorization are automatically deleted after 7 days. The platform does not store your data long-term.

  4. Click Get Results to get the multimodal embedding.

    • Results: Displays the embedding result.

    • Result source code: View the response code. You can use Copy Code or Download File to save the code locally.

    • Sample code: View and download the Sample code for calling this service.