In EAS, you can define and deploy an online inference service with a JSON configuration file.
Quick start
1. Prepare a JSON configuration file
To deploy a service, you need a JSON file that defines the required configurations. For first-time users, we recommend navigating to Custom Model Deployment > Custom Deployment to configure parameters. The system automatically generates the JSON configuration, which you can use as a template.
The following code is a sample service.json file. For a complete list of parameters, see Appendix: JSON Parameter Reference.
{
"metadata": {
"name": "demo",
"instance": 1,
"workspace_id": "your-workspace-id"
},
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.c7a.large"
}
]
}
},
"containers": [
{
"image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/python-inference:py39-ubuntu2004",
"script": "python app.py",
"port": 8000
}
]
}2. Deploy the service with JSON
-
Log on to the PAI console. Select a region on the top of the page. Then, select the desired workspace and click Elastic Algorithm Service (EAS).
On the Inference Service tab, click Deploy Service. Then, in the Custom Model Deployment section, select JSON Deployment.
Paste your JSON configuration and click Deploy. A service status of Running indicates a successful deployment.
Appendix: JSON parameters
Parameter | Required | Description |
metadata | Yes | The service's metadata. For more information, see metadata parameters. |
cloud | No | The compute resource and VPC configurations. For more information, see cloud parameters. |
containers | No | The image configuration. For more information, see containers parameters. |
dockerAuth | No | This parameter is required to access a private repository that requires authentication. The value is the Base64-encoded string of |
networking | No | The service invocation configuration. For more information, see networking parameters. |
storage | No | Mounts data from storage services such as OSS or NAS into the container. For configuration details, see storage mount. |
token | No | The access token for service authentication. If not specified, the system automatically generates one. |
aimaster | No | Enables computing power check and fault tolerance for multi-node distributed inference services. |
model_path | Yes | Required when deploying a service with a processor. The model_path and processor_path parameters specify the input data source locations for the model and the processor, respectively. The following formats are supported:
|
oss_endpoint | No | The OSS endpoint, for example, oss-cn-beijing.aliyuncs.com. For other valid values, see Regions and endpoints. Note By default, you do not need to specify this parameter. The service uses the internal OSS endpoint of the current region to download model files or Processor files. You must specify this parameter when you access OSS across regions. For example, if you deploy a service in the Hangzhou region and specify an OSS address in the Beijing region for model_path, you must use this parameter to specify the public OSS endpoint of the Beijing region. |
model_entry | No | The model's entry file, which can be any file within the model package. If unspecified, it defaults to the filename from model_path. The path to this entry file is passed to the initialize() function of the processor. |
model_config | No | The configuration for the model, which can be any text. This value is passed as the second argument to the processor's initialize() function. |
processor | No |
|
processor_path | No | The path to the processor package. For supported path formats, see the description of the model_path parameter. |
processor_entry | No | The entry file of the processor, such as libprocessor.so or app.py. This file must implement the This parameter is required if processor_type is set to cpp or python. |
processor_mainclass | No | The main class of the processor in the JAR package. For example, com.aliyun.TestProcessor. This parameter is required if processor_type is set to java. |
processor_type | No | The implementation language of the processor. The valid values are as follows:
|
warm_up_data_path | No | The path to the request file used for model warm-up. For more information about this feature, see model warm-up. |
runtime.enable_crash_block | No | Specifies whether an instance that crashes due to a processor code exception automatically restarts. Valid values:
|
autoscaler | No | The configuration for horizontal auto scaling. For detailed parameter descriptions, see horizontal auto scaling. |
labels | No | The labels to apply to the service. Use the |
unit.size | No | The number of machines per instance in a distributed inference configuration. The default value is 2. |
sinker | No | Persists all service requests and responses to MaxCompute or Log Service (SLS). For detailed parameter descriptions, see sinker parameters. |
confidential | No | Configures Trustee to ensure that information such as data, models, and code remains encrypted during service deployment and invocation. This enables a secure and encrypted inference service. The format is as follows: Note The secure encryption environment primarily protects mounted storage files. Ensure that you have mounted these files before enabling this feature. For more information, see secure and encrypted inference service.
|
Metadata parameters
Advanced parameters
Cloud parameters
Parameter | Required | Description | |
computing | instances | No | Specifies a list of instance types to use when deploying the service in a public resource group. If a bid for a spot instance fails or an instance type is out of stock, the system creates the service by using the next instance type in the list.
|
disable_spot_protection_period | No | Specifies whether to disable the protection period for a spot instance. This parameter applies only to spot instances. Valid values:
| |
networking | vpc_id | No | The ID of the VPC. |
vswitch_id | No | The ID of the VSwitch. | |
security_group_id | No | The ID of the security group. | |
destination_cidrs | No | If the CIDR block of the configured VSwitch conflicts with the EAS management CIDR blocks (10.224.0.0/16 or 10.240.0.0/12), you must explicitly set this parameter to the CIDR block of your VSwitch. Replace | |
Example:
{
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.c8i.2xlarge",
"spot_price_limit": 1
},
{
"type": "ecs.c8i.xlarge",
"capacity": "20%"
}
],
"disable_spot_protection_period": false
},
"networking": {
"vpc_id": "vpc-bp1oll7xawovg9*****",
"vswitch_id": "vsw-bp1jjgkw51nsca1e****",
"security_group_id": "sg-bp1ej061cnyfn0b*****"
}
}
}Container parameters
To deploy a service using a custom image, see Custom Images.
Parameter | Required | Description | |
image | Yes | The image address for the model service. Required when you deploy using an image. | |
env | name | No | The name of the environment variable. |
value | No | The value of the environment variable. | |
command | You must specify either command or script. | The entry point command for the image. This parameter supports only a single command. For complex scripts, such as | |
script | The entry point script for the image. You can specify complex scripts with multiple lines. Separate commands with | ||
port | No | The container port. Important
| |
prepare | pythonRequirements | No | A list of Python requirements to install before the instance starts. The image must have the python and pip commands available in the system PATH. For example: |
pythonRequirementsPath | No | The path to a requirements.txt file for installing Python packages before the instance starts. The image must have the python and pip commands available in the system PATH. This file can be included in the image or mounted from external storage. For example: | |
Networking parameters
Parameter | Required | Description |
gateway | No | Specifies the dedicated gateway for the EAS service. |
gateway_policy | No |
Example configuration: |
Sinker parameters
Parameter | Required | Description | |
type | No | Specifies the destination storage service. Supported values:
| |
config | maxcompute.project | No | The MaxCompute project name. |
maxcompute.table | No | The MaxCompute table name. | |
sls.project | No | The Log Service (SLS) project name. | |
sls.logstore | No | The Logstore name. | |
Example configurations:
Sink to MaxCompute
"sinker": {
"type": "maxcompute",
"config": {
"maxcompute": {
"project": "cl****",
"table": "te****"
}
}
}Sink to SLS
"sinker": {
"type": "sls",
"config": {
"sls": {
"project": "k8s-log-****",
"logstore": "d****"
}
}
}JSON configuration example
The following is a sample JSON configuration:
{
"token": "****M5Mjk0NDZhM2EwYzUzOGE0OGMx****",
"processor": "tensorflow_cpu_1.12",
"model_path": "oss://examplebucket/exampledir/",
"oss_endpoint": "oss-cn-beijing.aliyuncs.com",
"model_entry": "",
"model_config": "",
"processor_path": "",
"processor_entry": "",
"processor_mainclass": "",
"processor_type": "",
"warm_up_data_path": "",
"runtime": {
"enable_crash_block": false
},
"unit": {
"size": 2
},
"sinker": {
"type": "MaxCompute",
"config": {
"maxcompute": {
"project": "cl****",
"table": "te****"
}
}
},
"cloud": {
"computing": {
"instances": [
{
"capacity": 800,
"type": "dedicated_resource"
},
{
"capacity": 200,
"type": "ecs.c7.4xlarge",
"spot_price_limit": 3.6
}
],
"disable_spot_protection_period": true
},
"networking": {
"vpc_id": "vpc-bp1oll7xawovg9t8****",
"vswitch_id": "vsw-bp1jjgkw51nsca1e****",
"security_group_id": "sg-bp1ej061cnyfn0b****"
}
},
"autoscaler": {
"min": 2,
"max": 5,
"strategies": {
"qps": 10
}
},
"storage": [
{
"mount_path": "/data_oss",
"oss": {
"endpoint": "oss-cn-shanghai-internal.aliyuncs.com",
"path": "oss://bucket/path/"
}
}
],
"confidential": {
"trustee_endpoint": "xx",
"decryption_key": "xx"
},
"metadata": {
"name": "test_eascmd",
"resource": "eas-r-9lkbl2jvdm0puv****",
"instance": 1,
"workspace_id": "1405**",
"gpu": 0,
"cpu": 1,
"memory": 2000,
"gpu_memory": 10,
"gpu_core_percentage": 10,
"qos": "",
"cuda": "11.2",
"enable_grpc": false,
"enable_webservice": false,
"rdma": 1,
"rpc": {
"batching": false,
"keepalive": 5000,
"io_threads": 4,
"max_batch_size": 16,
"max_batch_timeout": 50,
"max_queue_size": 64,
"worker_threads": 5,
"rate_limit": 0,
"enable_sigterm": false
},
"rolling_strategy": {
"max_surge": 1,
"max_unavailable": 1
},
"eas.termination_grace_period": 30,
"scheduling": {
"spread": {
"policy": "host"
}
},
"resource_rebalancing": false,
"shm_size": 100
},
"features": {
"eas.aliyun.com/extra-ephemeral-storage": "100Gi",
"eas.aliyun.com/gpu-driver-version": "tesla=550.127.08"
},
"networking": {
"gateway": "gw-m2vkzbpixm7mo****"
},
"containers": [
{
"image": "registry-vpc.cn-shanghai.aliyuncs.com/xxx/yyy:zzz",
"prepare": {
"pythonRequirements": [
"numpy==1.16.4",
"absl-py==0.11.0"
]
},
"command": "python app.py",
"port": 8000
}
],
"dockerAuth": "dGVzdGNhbzoxM*******"
}