Prediction query-OpenSearch(Open Search)-阿里云帮助中心

How it works

A prediction query uses the built-in embedding model in Vector Search Edition to convert content such as text, images, or videos into a vector, and then retrieves results based on that vector.

If you have existing vectors and want to query them directly in your Vector Search Edition instance, see vector query.

Endpoint

/vector-service/inference-query

The URL above omits request headers, encoding, and other elements.
You must prepend your instance's host address to this path.
For details about each parameter, see the "Request body parameters" section below.

Request protocol

HTTP

Request method

POST

Supported format

JSON

Authentication

Use the following method to compute the authorization value:

Parameter	Type	Description
accessUserName	string	The username. You can find this in the Details>Network Information section of your instance.
accessPassWord	string	The password. You can set or change this in the Details>Network Information section of your instance.

import com.aliyun.darabonba.encode.Encoder;
import com.aliyun.darabonbastring.Client;

public class GenerateAuthorization {
 public static void main(String[] args) throws Exception {
 String accessUserName = "username";
 String accessPassWord = "password";
 String realmStr = "" + accessUserName + ":" + accessPassWord + "";
 String authorization = Encoder.base64EncodeToString(Client.toBytes(realmStr, "UTF-8"));
 System.out.println(authorization);
 }
}

Example of a correctly formatted authorization value:

cm9vdDp******mdhbA==

When making an HTTP request, prefix the value with Basic and provide it in the Authorization header.

Example (in the request header):

Authorization: Basic cm9vdDp******mdhbA==

Request body parameters

Parameter	Description	Default	Type	Required
tableName	The name of the table to query.	—	string	Yes
indexName	The name of the index to query.	The first configured index	string	No
content	The query content. Use this for queries that do not involve fusion vector retrieval.	—	string	Yes (for non-fusion vector retrieval)
contents	A list of query content items. Use this parameter for fusion vector retrieval, which supports content from multiple modalities.	—	list[string]	Yes (for fusion vector retrieval)
contentType	The data type of the content. Valid values: `text` (text), `image_encode` (Base64-encoded image), `video_uri` (OSS path of a video), and `video_encode` (Base64-encoded video). For fusion vector retrieval, provide a comma-separated list of types, such as `text,image_encode`, where each type corresponds to an item in the `contents` list.	—	string	No
modal	The modality of the embedding model. Valid values: `text` (for text-to-text or text-to-image retrieval), `image` (for search by image), `video` (for video retrieval, which supports text, image, or video as input), and `fusion` (for fusion vector retrieval, which encodes multiple fields into a single vector for cross-modal retrieval).	—	string	Yes
videoFrameTopK	The number of frames to retrieve for a video query.	100	int	No
namespace	The namespace to query.	""	string	No
topK	The number of results to return.	100	int	No
includeVector	Specifies whether to include the vector in the response.	false	bool	No
outputFields	A list of fields to include in the response.	[]	list[string]	No
order	The sort order for the results. Valid values: ASC for ascending, DESC for descending.	ASC	string	No
searchParams	Algorithm-specific query parameters: QcSearcher HNSW (Hierarchical Navigable Small World) configuration	""	string	No
filter	A filter expression to apply to the search.	""	string	No
scoreThreshold	Filters results based on their score. When using Euclidean distance, returns results with a score less than `scoreThreshold`. When using inner product, returns results with a score greater than `scoreThreshold`.	No filtering by default	float	No

Response parameters

Field	Description	Type
result	A list of matching items.	list[Item]
totalCount	The number of items in the result list.	int
totalTime	The engine processing time, in milliseconds (ms).	float
errorCode	The error code. This field appears only when an error occurs.	int
errorMsg	The error message. This field appears only when an error occurs.	string

Item object definition

Field	Description	Type
score	The distance score.	float
fields	A map of field names and their corresponding values.	map<string, FieldType>
vector	The vector value.	list[float]
id	The primary key value. The type matches the defined field type.	FieldType
namespace	The namespace of the vector. This field is returned only if a namespace is set.	string

The API response may include additional fields, such as __source__ and coveredPercent, for internal debugging purposes. These fields do not affect business logic and can be safely ignored.

Examples

Text-to-text retrieval

Request body:

{
  "tableName": "gist",
  "indexName": "test",
  "content": "hello",
  "modal": "text",
  "topK": 3,
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
  "includeVector": true
}

Response:

{
  "result":[
    {
      "id": 1,
      "score":1.0508723258972169,
      "vector": [0.1, 0.2, 0.3]
    },
    {
      "id": 2,
      "score":1.0329746007919312,
      "vector": [0.2, 0.2, 0.3]
    },
    {
      "id": 3,
      "score":0.980593204498291,
      "vector": [0.3, 0.2, 0.3]
    }
  ],
  "totalCount":3,
  "totalTime":2.943
}

Image retrieval

Text-to-image retrieval:

Request body:

{
  "tableName": "gist",
  "indexName": "test",
  "content": "Bicycle",
  "modal": "text",
  "topK": 3,
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
  "includeVector": true
}

Response:

{
  "result":[
    {
      "id": 1,
      "score":1.0508723258972169,
      "vector": [0.1, 0.2, 0.3]
    },
    {
      "id": 2,
      "score":1.0329746007919312,
      "vector": [0.2, 0.2, 0.3]
    },
    {
      "id": 3,
      "score":0.980593204498291,
      "vector": [0.3, 0.2, 0.3]
    }
  ],
  "totalCount":3,
  "totalTime":2.943
}

Search by image:

Request body:

{
  "tableName": "gist",
  "indexName": "test",
  "content": "base64-encoded image data",
  "modal": "image",
  "topK": 3,
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
  "includeVector": true
}

Response:

{
    "totalCount": 5,
    "result": [
        {
            "id": 5,
            "score": 1.103209137916565
        },
        {
            "id": 3,
            "score": 1.1278988122940064
        },
        {
            "id": 2,
            "score": 1.1326735019683838
        }
    ],
    "totalTime": 242.615
}

Subject identification

Request body:

Without the range parameter:

{
 "tableName": "gist",
 "indexName": "test",
 "content": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQ",
 "modal": "image",
 "searchParams": "{\"crop\": true}",
 "topK": 3,
 "includeVector": true
}

Note: "crop":true enables subject identification. If the range parameter is not provided, the model automatically detects the subject.

With the range parameter:

{
 "tableName": "gist",
 "indexName": "test",
 "content": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQ",
 "modal": "image",
 "searchParams": "{\"crop\": true, \"range\": \"100,100,60,70\"}",
 "topK": 3,
 "includeVector": true
}

"crop":true, "range":"100,100,60,70" enables subject identification within a specified region of the image. The four numbers in range represent the (x, y) coordinates of the top-left corner of the region, its width, and its height.

Response:
```
{
 "result":[
 {
 "id": 1,
 "score":1.0508723258972169,
 "vector": [0.1, 0.2, 0.3]
 }
 ],
 "__meta__": {
 "__range__": "100,100,60,70;",
 }
 "totalCount":1,
 "totalTime":2.943
}
```
- When you perform subject identification with modal=image, the response includes the __range__ field.
- The __range__ field indicates the detected region of the subject in x,y,width,height format.
- If the model identifies multiple subjects, their regions are listed in the __range__ field, sorted by score in descending order. The query returns results for the first (highest-scoring) subject by default.

Text-to-video retrieval

Request body:

{
  "tableName": "video",
  "content": "hello",
  "modal": "video",
  "topK": 3,
  "videoFrameTopK":100,
  "contentType":"text",
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
}

Response:

{
  "result":[
    {
      "videoId": 1,
      "videoUri": "oss://...",
      "fields" : {
        "tag" : "demo"
      },
      "clips": [{
          "queryStartTime": 5,
          "startTime": 5,
          "duration": 5,
          "queryStartFrameIndex": 150,
          "queryEndFrameIndex": 300,
          "startFrameIndex": 150,
          "endFrameIndex": 300,
          "sim": 0.8
       }]
    }
  ],
  "totalCount":1,
  "totalTime":2.943
}

Video-to-video retrieval

Supported video formats include MP4, AVI, MKV, MOV, FLV, and WebM.

Request body:

Using an OSS URI:

{
  "tableName": "video",
  "content": "oss://...",
  "modal": "video",
  "topK": 3,
  "videoFrameTopK":100,
  "contentType":"video_uri",
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
}

Using Base64-encoded video data:

{
  "tableName": "video",
  "content": "data:video/mp4;base64,AAAAIGZ0eXBtcDQyAAABAGlxxxxxxx",
  "modal": "video",
  "topK": 3,
  "videoFrameTopK":100,
  "contentType":"video_encode",
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
}

The format is data:video/{format};base64,{base64_video}, where:

video/{format}: The format of the video. For example, use video/mp4 for an MP4 file.
base64_video: The Base64-encoded video data.

Response:

{
  "result":[
    {
      "videoId": 1,
      "videoUri": "oss://...",
      "fields" : {
        "tag" : "demo"
      },      
      "clips": [{
          "queryStartTime": 5,
          "startTime": 5,
          "duration": 5,
          "queryStartFrameIndex": 150,
          "queryEndFrameIndex": 300,
          "startFrameIndex": 150,
          "endFrameIndex": 300,
          "sim": 0.8
       }]
    }
  ],
  "totalCount":1,
  "totalTime":2.943
}

Image-to-video retrieval

Supported image formats include PNG, JPEG, and JPG.

Request body:
```
{
  "tableName": "video",
  "content": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAxxxxxx",
  "modal": "video",
  "topK": 3,
  "videoFrameTopK":100,
  "contentType":"image_encode", 
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
}
```
The image is provided as Base64 data. Pass the encoded data to the content parameter in the format data:image/{format};base64,{base64_image}, where:
- image/{format}: The format of the image. For example, use image/jpeg for a JPG file.
- base64_image: The Base64-encoded image data.

Response:

{
  "result":[
    {
      "videoId": 1,
      "videoUri": "oss://...",
      "fields" : {
        "tag" : "demo"
      },      
      "clips": [{
          "queryStartTime": 5,
          "startTime": 5,
          "duration": 5,
          "queryStartFrameIndex": 150,
          "queryEndFrameIndex": 300,
          "startFrameIndex": 150,
          "endFrameIndex": 300,
          "sim": 0.8
       }]
    }
  ],
  "totalCount":3,
  "totalTime":2.943
}

Fusion vector retrieval

Fusion vector retrieval encodes content from multiple modalities, such as text and images, into a single fusion vector for cross-modal retrieval. Before using this feature, you must configure the fusion vector fields in your table settings. For more information, see Configure fusion vectors.

Fusion vector retrieval differs from other prediction queries in the following ways:

The modal parameter is set to fusion.
The contents parameter (a list) is used instead of the content parameter to pass content from multiple modalities.
The contentType parameter contains a comma-separated list of types corresponding to the items in the contents list.

Request body:

{
  "tableName": "gist",
  "indexName": "test",
  "contents": ["hello", "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAxxxxxx"],
  "modal": "fusion",
  "contentType": "text,image_encode",
  "topK": 3,
  "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
  "includeVector": true
}

The first element in contents is the text "hello", and the second element is Base64-encoded image data. In contentType, text and image_encode correspond to the types of the two elements in contents, respectively.

Response:

{
  "result":[
    {
      "id": 1,
      "score":1.0508723258972169,
      "vector": [0.1, 0.2, 0.3]
    },
    {
      "id": 2,
      "score":1.0329746007919312,
      "vector": [0.2, 0.2, 0.3]
    },
    {
      "id": 3,
      "score":0.980593204498291,
      "vector": [0.3, 0.2, 0.3]
    }
  ],
  "totalCount":3,
  "totalTime":2.943
}