Model management

更新时间:
复制 MD 格式

The Lindorm AI engine provides a set of RESTful APIs to deploy, manage, and view your models. This topic describes how to use these APIs to perform model management tasks.

Deploy a model

Use the following API to deploy a pre-trained AI model to analyze and process data in your database. You can deploy an open-source model supported by the platform or import a Bring Your Own Model (BYOM) by uploading a model file.

Note

Before you deploy a BYOM model, you must upload it to Lindorm DFS. For instructions, see Upload a model file.

API

POST v1/ai/models/create

Request parameters

Parameter

Description

model_name

A custom name for the model. The name can contain only uppercase letters, lowercase letters, and underscores (_).

task

The task type of the model. Valid values:

  • FEATURE_EXTRACTION: An embedding model that converts data such as text and images into vectors. This task is suitable for scenarios that require feature extraction, such as data retrieval and clustering.

  • QUESTION_ANSWERING: An LLM Q&A model that is typically used for scenarios such as knowledge-based Q&A and question retrieval.

  • SEMANTIC_SIMILARITY: A rerank model that calculates the semantic similarity between a query and a set of chunks and performs reranking to improve the accuracy of the results.

    Note

    The query and chunks parameters are used for model inference. For more information, see SEMANTIC_SIMILARITY.

model_path

  • The repository address of an open-source model on Hugging Face or ModelScope. Different models are supported for different task values. Supported models include:

    • FEATURE_EXTRACTION

      • huggingface://BAAI/bge-large-zh-v1.5

      • huggingface://moka-ai/m3e-base

      • huggingface:///thenlper/gte-large-zh

      • huggingface://BAAI/bge-m3

      • modelscope://jinaai/jina-embeddings-v2-base-zh

      • huggingface://maidalun1020/bce-embedding-base_v1

      • huggingface://BAAI/bge-visualized

        Important

        Currently, this model is supported only by Lindorm instances. For information about how to create a Lindorm instance, see Create a Lindorm instance.

    • QUESTION_ANSWERING

      • huggingface://THUDM/chatglm2-6b-int4

      • modelscope://qwen/Qwen-7B-Chat-Int4

      • modelscope://qwen/Qwen-14B-Chat-Int4

    • SEMANTIC_SIMILARITY

      • huggingface://BAAI/bge-reranker-large

      • huggingface://BAAI/bge-reranker-base

      • huggingface://maidalun1020/bce-reranker-base_v1

      • huggingface://BAAI/bge-reranker-v2-m3

  • The path of a BYOM model in Lindorm DFS (LDFS). Example: ldfs://models/user_model_path.zip.

algorithm

The model algorithm.

  • The following algorithms are supported for open-source models:

    Important

    When you deploy an open-source model, this parameter is required and must match the specified model_path.

    • FEATURE_EXTRACTION

      • BGE_LARGE_ZH

      • M3E_BASE

      • GTE_LARGE_ZH

      • BGE_M3

      • JINA_V2_BASE_ZH

      • BCE_EMBEDDING_BASE_V1

      • BGE_VISUALIZED_M3

    • QUESTION_ANSWERING

      • CHATGLM2_6B_INT4

      • QWEN_7B_CHAT_INT4

      • QWEN_14B_CHAT_INT4

    • SEMANTIC_SIMILARITY

      • BGE_RERANKER_LARGE

      • BGE_RERANKER_BASE

      • BCE_RERANKER_BASE_V1

      • BGE_RERANKER_V2_M3

  • You do not need to specify this parameter when you deploy a BYOM model.

settings

A JSON-formatted string of custom parameters. The supported custom parameters depend on the task value. For more information, see Custom parameters (settings).

Custom parameters (settings)

Feature extraction

Parameter

Type

Description

quantization

STRING

Specifies whether to apply model quantization. This is disabled by default. To enable fp16 quantization, set this parameter to 'fp16'.

instance_count

INT

The number of model instances. Default: 1. A higher instance count can improve model inference performance but also increases GPU memory consumption.

max_batch_size

INT

The maximum batch size for the model. Default: 1024. The value must be in the range [1,1024].

Question answering

Parameter

Type

Description

stream_mode

STRING

Specifies whether the LLM Q&A model provides streaming output. Valid values:

  • on: Enables streaming output.

  • off: Disables streaming output. This is the default value.

Semantic similarity

Parameter

Type

Description

quantization

STRING

Specifies whether to apply model quantization. This is disabled by default. To enable fp16 quantization, set this parameter to 'fp16'.

instance_count

INT

The number of model instances. Default: 1. A higher instance count can improve model inference performance but also increases GPU memory consumption.

Examples

Example 1: Feature extraction

Open-source model

Request:

POST v1/ai/models/create HTTP/1.1
Content-Type: application/json
{
    "model_name": "bge_m3_model",
    "model_path": "huggingface://BAAI/bge-m3",
    "task": "FEATURE_EXTRACTION",
    "algorithm": "BGE_M3"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17

{
  "code": 0,
  "msg": "SUCCESS",
  "data": null,
  "success": true
}

BYOM model

Request:

POST v1/ai/models/create HTTP/1.1
Content-Type: application/json

{
    "model_name": "byom_model",
    "model_path": "ldfs://models/my_model_1.zip",
    "task": "FEATURE_EXTRACTION"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17

{
  "code": 0,
  "msg": "SUCCESS",
  "data": null,
  "success": true
}

Example 2: Question answering

Note

BYOM models do not currently support the question answering task.

The following example deploys the ChatGLM2 model with streaming output enabled:

POST v1/ai/models/create HTTP/1.1
Content-Type: application/json

{
    "model_name": "qa_model",
    "model_path": "huggingface://THUDM/chatglm2-6b-int4",
    "task": "QUESTION_ANSWERING",
    "algorithm": "CHATGLM2_6B_INT4",
    "settings": {"stream_mode": "on"}
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17

{
  "code": 0,
  "msg": "SUCCESS",
  "data": null,
  "success": true
}

Example 3: Semantic similarity

Open-source model

Request:

POST v1/ai/models/create HTTP/1.1
Content-Type: application/json

{
    "model_name": "bge_rerank_model",
    "model_path": "huggingface://BAAI/bge-reranker-large",
    "task": "SEMANTIC_SIMILARITY",
    "algorithm": "BGE_RERANKER_LARGE"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17

{
  "code": 0,
  "msg": "SUCCESS",
  "data": null,
  "success": true
}

BYOM model

Request:

POST v1/ai/models/create HTTP/1.1
Content-Type: application/json

{
    "model_name": "byom_rerank_model",
    "model_path": "ldfs://models/my_model_2.zip",
    "task": "SEMANTIC_SIMILARITY"
}

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17

{
  "code": 0,
  "msg": "SUCCESS",
  "data": null,
  "success": true
}

List models

Use the following API to list all your models.

API

GET v1/ai/models/list

Response parameters

Parameter

Description

models

A list of model details, returned as an array.

models.name

The name of the model.

models.status

The status of the model. Valid values:

  • READY: The model is deployed and ready for use.

  • INIT: The model is initializing.

  • PREPARE: The model is preparing for deployment.

  • FAILED: The model deployment has failed.

models.sql_function

The SQL function for invoking the model.

models.created_time

The time when the model was created.

models.update_time

The time when the model was last updated.

Example

Request:

GET v1/ai/models/list HTTP/1.1

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:19:26 GMT
Content-type: application/json
Content-length: 2099

{
  "code": 0,
  "msg": "SUCCESS",
  "data": {
    "models": [{
      "name": "bge_m3_model",
      "status": "READY",
      "created_time": "...",
      "updated_time": "...",
      ...
    }, {
      "name": "bge_model",
      "status": "READY",
      ...
    }]
  },
  "success": true
}

Model details

After deploying a model, use the following API to view its details.

API

GET v1/ai/models/${MODEL_NAME}/status

Response parameters

Parameter

Description

name

The name of the model.

status

The status of the model. Valid values:

  • READY: The model is deployed and ready for use.

  • INIT: The model is initializing.

  • PREPARE: The model is preparing for deployment.

  • FAILED: The model deployment has failed.

sql_function

The SQL function for invoking the model.

task_type

The task type of the model.

algorithm

The model's algorithm.

query

The associated SQL query. (Optional)

preprocessors

The associated preprocessing operations. (Optional)

settings

The model's configuration settings.

metrics

The metrics related to the model. (Optional)

error

The error message from the model deployment process. (Optional)

progress

The deployment progress of the model. (Optional)

created_time

The time when the model was created.

update_time

The time when the model was last updated.

Example

Request:

GET v1/ai/models/bge_m3_model/status HTTP/1.1

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 419

{
  "code": 0,
  "msg": "SUCCESS",
  "data": {
    "name": "bge_m3_model",
    "status": "READY",
    "task_type":"FEATURE_EXTRACTION",
    "algorithm":"BGE_M3",
    "settings": "...",
    "error":"...",
    "progress": "...",
    ...
  },
  "success": true
}

Delete a model

Use the following API to delete a specific model.

API

POST v1/ai/models/${MODEL_NAME}/drop

Example

Request:

POST v1/ai/models/bge_m3_model/drop HTTP/1.1

Response:

HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17

{
  "code": 0,
  "msg": "SUCCESS",
  "data": null,
  "success": true
}

Upload a model file

Use the following API to upload a compressed model file to LDFS to deploy a BYOM model.

API

POST v1/ai/models/upload

Request settings

  • In the HTTP request header, add the x-ld-filename key. Set its value to the name of the model file. This filename is part of the model_path you specify when deploying the BYOM model.

  • Upload the model file in binary format.

Response parameters

Parameter

Description

model_path

The LDFS path of the uploaded model file.

Example

Request:

POST /v1/ai/models/upload HTTP/1.1
x-ld-filename: m3e_finetuned.zip
Content-Type: application/zip
Content-Length: 22

"<file contents here>"

Response:

HTTP/1.1 200 OK
Date: Sat, 11 May 2024 02:51:51 GMT
Content-type: application/json
Content-length: 149

{
    "code": 0,
    "msg": "SUCCESS",
    "data": {
        "model_path": "ldfs://models/m3e_finetuned.zip"
    },
    "success": true,
    "request_id": "cf4c9ed8-1185-4650-9687-d09826b839f4"
}