If a deployed model's configuration does not meet your current business needs, you can adjust the parameters online using a RESTful API operation. This method dynamically updates the model configuration file without affecting the original model or the model service. This topic describes how to use this API operation to adjust some parameter settings.
Prerequisites
A model has been created or imported, and its status is READY. For more information, see View model details.
API operations
POST v1/ai/models/${MODEL_NAME}/update_configRequest parameters
Parameter | Type | Description |
instance_count | INT | The number of model instances to update. Increasing the number of model instances improves inference performance but also increases GPU memory usage. |
max_batch_size | INT | The maximum batch size to support. The default value is Important Only embedding models support updating the max_batch_size value that was set during deployment. |
Example
Request:
POST v1/ai/models/bge_m3_model/update_config HTTP/1.1
Content-Type: application/json
{
"instance_count": "4",
"max_batch_size": "1024"
}Response:
HTTP/1.1 200 OK
Date: Tue, 28 Nov 2023 03:18:55 GMT
Content-type: application/json
Content-length: 17
{
"code": 0,
"msg": "SUCCESS",
"data": null,
"success": true
}