Prediction query

更新时间:
复制 MD 格式

How it works

A prediction query uses the built-in embedding model in Vector Search Edition to convert content such as text, images, or videos into a vector, and then retrieves results based on that vector.

If you have existing vectors and want to query them directly in your Vector Search Edition instance, see vector query.

Endpoint

/vector-service/inference-query

  • The URL above omits request headers, encoding, and other elements.

  • You must prepend your instance's host address to this path.

  • For details about each parameter, see the "Request body parameters" section below.

Request protocol

HTTP

Request method

POST

Supported format

JSON

Authentication

Use the following method to compute the authorization value:

Parameter

Type

Description

accessUserName

string

The username. You can find this in the Details>Network Information section of your instance.

accessPassWord

string

The password. You can set or change this in the Details>Network Information section of your instance.

import com.aliyun.darabonba.encode.Encoder;
import com.aliyun.darabonbastring.Client;

public class GenerateAuthorization {
 public static void main(String[] args) throws Exception {
 String accessUserName = "username";
 String accessPassWord = "password";
 String realmStr = "" + accessUserName + ":" + accessPassWord + "";
 String authorization = Encoder.base64EncodeToString(Client.toBytes(realmStr, "UTF-8"));
 System.out.println(authorization);
 }
}

Example of a correctly formatted authorization value:

cm9vdDp******mdhbA==

When making an HTTP request, prefix the value with Basic and provide it in the Authorization header.

Example (in the request header):

Authorization: Basic cm9vdDp******mdhbA==

Request body parameters

Parameter

Description

Default

Type

Required

tableName

The name of the table to query.

string

Yes

indexName

The name of the index to query.

The first configured index

string

No

content

The query content. Use this for queries that do not involve fusion vector retrieval.

string

Yes (for non-fusion vector retrieval)

contents

A list of query content items. Use this parameter for fusion vector retrieval, which supports content from multiple modalities.

list[string]

Yes (for fusion vector retrieval)

contentType

The data type of the content. Valid values: text (text), image_encode (Base64-encoded image), video_uri (OSS path of a video), and video_encode (Base64-encoded video).

For fusion vector retrieval, provide a comma-separated list of types, such as text,image_encode, where each type corresponds to an item in the contents list.

string

No

modal

The modality of the embedding model. Valid values: text (for text-to-text or text-to-image retrieval), image (for search by image), video (for video retrieval, which supports text, image, or video as input), and fusion (for fusion vector retrieval, which encodes multiple fields into a single vector for cross-modal retrieval).

string

Yes

videoFrameTopK

The number of frames to retrieve for a video query.

100

int

No

namespace

The namespace to query.

""

string

No

topK

The number of results to return.

100

int

No

includeVector

Specifies whether to include the vector in the response.

false

bool

No

outputFields

A list of fields to include in the response.

[]

list[string]

No

order

The sort order for the results. Valid values: ASC for ascending, DESC for descending.

ASC

string

No

searchParams

Algorithm-specific query parameters:

""

string

No

filter

A filter expression to apply to the search.

""

string

No

scoreThreshold

Filters results based on their score.

When using Euclidean distance, returns results with a score less than scoreThreshold. When using inner product, returns results with a score greater than scoreThreshold.

No filtering by default

float

No

Response parameters

Field

Description

Type

result

A list of matching items.

list[Item]

totalCount

The number of items in the result list.

int

totalTime

The engine processing time, in milliseconds (ms).

float

errorCode

The error code. This field appears only when an error occurs.

int

errorMsg

The error message. This field appears only when an error occurs.

string

  • Item object definition

Field

Description

Type

score

The distance score.

float

fields

A map of field names and their corresponding values.

map<string, FieldType>

vector

The vector value.

list[float]

id

The primary key value. The type matches the defined field type.

FieldType

namespace

The namespace of the vector. This field is returned only if a namespace is set.

string

The API response may include additional fields, such as __source__ and coveredPercent, for internal debugging purposes. These fields do not affect business logic and can be safely ignored.

Examples

Text-to-text retrieval

  • Request body:

    {
      "tableName": "gist",
      "indexName": "test",
      "content": "hello",
      "modal": "text",
      "topK": 3,
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
      "includeVector": true
    }
  • Response:

    {
      "result":[
        {
          "id": 1,
          "score":1.0508723258972169,
          "vector": [0.1, 0.2, 0.3]
        },
        {
          "id": 2,
          "score":1.0329746007919312,
          "vector": [0.2, 0.2, 0.3]
        },
        {
          "id": 3,
          "score":0.980593204498291,
          "vector": [0.3, 0.2, 0.3]
        }
      ],
      "totalCount":3,
      "totalTime":2.943
    }

Image retrieval

Text-to-image retrieval:

  • Request body:

    {
      "tableName": "gist",
      "indexName": "test",
      "content": "Bicycle",
      "modal": "text",
      "topK": 3,
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
      "includeVector": true
    }
  • Response:

    {
      "result":[
        {
          "id": 1,
          "score":1.0508723258972169,
          "vector": [0.1, 0.2, 0.3]
        },
        {
          "id": 2,
          "score":1.0329746007919312,
          "vector": [0.2, 0.2, 0.3]
        },
        {
          "id": 3,
          "score":0.980593204498291,
          "vector": [0.3, 0.2, 0.3]
        }
      ],
      "totalCount":3,
      "totalTime":2.943
    }

Search by image:

  • Request body:

    {
      "tableName": "gist",
      "indexName": "test",
      "content": "base64-encoded image data",
      "modal": "image",
      "topK": 3,
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
      "includeVector": true
    }
  • Response:

    {
        "totalCount": 5,
        "result": [
            {
                "id": 5,
                "score": 1.103209137916565
            },
            {
                "id": 3,
                "score": 1.1278988122940064
            },
            {
                "id": 2,
                "score": 1.1326735019683838
            }
        ],
        "totalTime": 242.615
    }

Subject identification

  • Request body:

    Without the range parameter:

    {
     "tableName": "gist",
     "indexName": "test",
     "content": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQ",
     "modal": "image",
     "searchParams": "{\"crop\": true}",
     "topK": 3,
     "includeVector": true
    }

    Note: "crop":true enables subject identification. If the range parameter is not provided, the model automatically detects the subject.

    With the range parameter:

    {
     "tableName": "gist",
     "indexName": "test",
     "content": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQ",
     "modal": "image",
     "searchParams": "{\"crop\": true, \"range\": \"100,100,60,70\"}",
     "topK": 3,
     "includeVector": true
    }

    "crop":true, "range":"100,100,60,70" enables subject identification within a specified region of the image. The four numbers in range represent the (x, y) coordinates of the top-left corner of the region, its width, and its height.

  • Response:

    {
     "result":[
     {
     "id": 1,
     "score":1.0508723258972169,
     "vector": [0.1, 0.2, 0.3]
     }
     ],
     "__meta__": {
     "__range__": "100,100,60,70;",
     }
     "totalCount":1,
     "totalTime":2.943
    }
    • When you perform subject identification with modal=image, the response includes the __range__ field.

    • The __range__ field indicates the detected region of the subject in x,y,width,height format.

    • If the model identifies multiple subjects, their regions are listed in the __range__ field, sorted by score in descending order. The query returns results for the first (highest-scoring) subject by default.

Text-to-video retrieval

  • Request body:

    {
      "tableName": "video",
      "content": "hello",
      "modal": "video",
      "topK": 3,
      "videoFrameTopK":100,
      "contentType":"text",
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
    }
  • Response:

    {
      "result":[
        {
          "videoId": 1,
          "videoUri": "oss://...",
          "fields" : {
            "tag" : "demo"
          },
          "clips": [{
              "queryStartTime": 5,
              "startTime": 5,
              "duration": 5,
              "queryStartFrameIndex": 150,
              "queryEndFrameIndex": 300,
              "startFrameIndex": 150,
              "endFrameIndex": 300,
              "sim": 0.8
           }]
        }
      ],
      "totalCount":1,
      "totalTime":2.943
    }

Video-to-video retrieval

Supported video formats include MP4, AVI, MKV, MOV, FLV, and WebM.

  • Request body:

    Using an OSS URI:

    {
      "tableName": "video",
      "content": "oss://...",
      "modal": "video",
      "topK": 3,
      "videoFrameTopK":100,
      "contentType":"video_uri",
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
    }

    Using Base64-encoded video data:

    {
      "tableName": "video",
      "content": "data:video/mp4;base64,AAAAIGZ0eXBtcDQyAAABAGlxxxxxxx",
      "modal": "video",
      "topK": 3,
      "videoFrameTopK":100,
      "contentType":"video_encode",
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
    }

    The format is data:video/{format};base64,{base64_video}, where:

    • video/{format}: The format of the video. For example, use video/mp4 for an MP4 file.

    • base64_video: The Base64-encoded video data.

  • Response:

    {
      "result":[
        {
          "videoId": 1,
          "videoUri": "oss://...",
          "fields" : {
            "tag" : "demo"
          },      
          "clips": [{
              "queryStartTime": 5,
              "startTime": 5,
              "duration": 5,
              "queryStartFrameIndex": 150,
              "queryEndFrameIndex": 300,
              "startFrameIndex": 150,
              "endFrameIndex": 300,
              "sim": 0.8
           }]
        }
      ],
      "totalCount":1,
      "totalTime":2.943
    }

Image-to-video retrieval

Supported image formats include PNG, JPEG, and JPG.

  • Request body:

    {
      "tableName": "video",
      "content": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAxxxxxx",
      "modal": "video",
      "topK": 3,
      "videoFrameTopK":100,
      "contentType":"image_encode", 
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}"
    }

    The image is provided as Base64 data. Pass the encoded data to the content parameter in the format data:image/{format};base64,{base64_image}, where:

    • image/{format}: The format of the image. For example, use image/jpeg for a JPG file.

    • base64_image: The Base64-encoded image data.

  • Response:

    {
      "result":[
        {
          "videoId": 1,
          "videoUri": "oss://...",
          "fields" : {
            "tag" : "demo"
          },      
          "clips": [{
              "queryStartTime": 5,
              "startTime": 5,
              "duration": 5,
              "queryStartFrameIndex": 150,
              "queryEndFrameIndex": 300,
              "startFrameIndex": 150,
              "endFrameIndex": 300,
              "sim": 0.8
           }]
        }
      ],
      "totalCount":3,
      "totalTime":2.943
    }

Fusion vector retrieval

Fusion vector retrieval encodes content from multiple modalities, such as text and images, into a single fusion vector for cross-modal retrieval. Before using this feature, you must configure the fusion vector fields in your table settings. For more information, see Configure fusion vectors.

Fusion vector retrieval differs from other prediction queries in the following ways:

  • The modal parameter is set to fusion.

  • The contents parameter (a list) is used instead of the content parameter to pass content from multiple modalities.

  • The contentType parameter contains a comma-separated list of types corresponding to the items in the contents list.

  • Request body:

    {
      "tableName": "gist",
      "indexName": "test",
      "contents": ["hello", "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAxxxxxx"],
      "modal": "fusion",
      "contentType": "text,image_encode",
      "topK": 3,
      "searchParams":"{\"qc.searcher.scan_ratio\":0.01}",
      "includeVector": true
    }

    The first element in contents is the text "hello", and the second element is Base64-encoded image data. In contentType, text and image_encode correspond to the types of the two elements in contents, respectively.

  • Response:

    {
      "result":[
        {
          "id": 1,
          "score":1.0508723258972169,
          "vector": [0.1, 0.2, 0.3]
        },
        {
          "id": 2,
          "score":1.0329746007919312,
          "vector": [0.2, 0.2, 0.3]
        },
        {
          "id": 3,
          "score":0.980593204498291,
          "vector": [0.3, 0.2, 0.3]
        }
      ],
      "totalCount":3,
      "totalTime":2.943
    }