通过部署LLM智能路由使用LLM批量推理-人工智能平台PAI-阿里云

EAS的LLM智能路由支持兼容OpenAI的批量推理，允许提交大规模的推理请求并在后台处理，适用于离线评估、数据标注、内容生成等不需要实时响应的场景。相比在线推理，批量推理能够显著降低成本并提高资源利用率。

功能特点

异步处理，解耦调用：提交大规模请求后，无需同步等待结果，可以异步查询任务状态，降低了客户端的等待压力和复杂性。
削峰填谷，降本增效：允许在业务低峰期或资源空闲时执行大量推理任务，最大化GPU等计算资源的利用率，从而显著降低单位推理请求的成本。

服务部署

核心参数说明

要使用LLM批量推理功能，需要在部署LLM智能路由服务时配置OSS存储路径并授予相应的访问权限。目前仅支持通过JSON参数设置，相关参数如下：

参数名	是否必须	描述
`llm_gateway.batch_oss_path`	是	用于存放批量推理任务输入文件和输出结果的OSS路径。格式要求：必须以`oss://`开头，包含Bucket名称和可选的前缀，例如`oss://my-batch-bucket/batchllm/`。最佳实践：建议为批量推理任务使用独立的Bucket或前缀，便于权限管理和配置生命周期规则。
`options.enable_ram_role`	是	必须设置为`true`。此项用于授权EAS访问OSS Bucket。
`llm_gateway.batch_oss_endpoint`	否	OSS的Endpoint。默认为当前地域的内网Endpoint，通常无需配置。
`llm_gateway.batch_options`	否	批量推理的其他参数选项： `--batch-parallel`：处理分片的并发度。默认为`8`。 `--batch-lines-per-shard`：每个分片包含的最多请求行数。默认为`1000`。 `--batch-request-timeout`：单个推理请求的超时时间。Go语言的duration格式，如`"3m"`（3分钟）、`"10s"`（10秒）。默认为`3m`。 `--batch-request-retry-times`：单个推理请求失败后的重试次数。默认为`3`。例如：`--batch-parallel=10,--batch-lines-per-shard=500,--batch-request-timeout=5m`。

部署方式

批量推理的服务部署支持2种方式：

单独部署：先部署一个独立的LLM智能路由服务，再部署LLM服务并关联该路由。
统一部署：将LLM智能路由和推理服务打包在同一个服务内部署。

单独部署

部署 LLM智能路由服务。在自定义模型部署 > JSON独立部署中填入JSON配置文件，然后单击部署。JSON配置示例如下。

注意：llm_gateway.batch_oss_path、metadata.workspace_id（工作空间id）、metadata.group（群组名称）、metadata.name（服务名称）根据实际情况修改。

{
    "llm_gateway": {
        "batch_oss_path": "oss://your-bucket/path/to/prefix"
    },
    "llm_scheduler": {
        "cpu": 2,
        "memory": 4000,
        "policy": "prefix-cache"
    },
    "metadata": {
        "cpu": 4,
        "gpu": 0,
        "group": "group_llm_gateway",
        "instance": 2,
        "memory": 8000,
        "name": "llm_gateway",
        "type": "LLMGatewayService",
        "workspace_id": "217**3"
    },
    "options": {
        "enable_ram_role": true  
    }
}

部署LLM服务。参见LLM大语言模型部署部署一个Qwen3-8B的服务。
重要
目前暂不支持通过部署页面的服务功能 > LLM智能路由直接关联支持批量推理的LLM智能路由，请在配置完其他参数后，在服务配置区域，单击编辑，在JSON中增加metadata.group参数，值为部署LLM智能路由时设置的群组名称，如示例中group_llm_gateway。

统一部署

统一部署只需在JSON文件中LLM智能路由服务部分增加options.enable_ram_role 与llm_gateway.batch_oss_path两个参数。

{
  "metadata": {
    "group": "feitest",
    "name": "feitest",
    "workspace_id": "217123"
  },
  "members": [
    {
      "llm_gateway": {
        "batch_oss_path": "oss://your-bucket/path/to/prefix", // required
        "infer_backend": "vllm"
      },
      "llm_scheduler": {
        "cpu": 2,
        "memory": 4000,
        "policy": "prefix-cache"
      },
      "metadata": {
        "cpu": 4,
        "gpu": 0,
        "group": "group_llm_gateway",
        "instance": 2,
        "memory": 8000,
        "name": "llm_gateway",
        "type": "LLMGatewayService",
        "workspace_id": "217123"
      },
      "options": {
        "enable_ram_role": true // required
      }
    },
   { 
      // inference member
    }
  ]
}

获取访问凭证

进行批量推理必须将请求发往LLM智能路由服务，请根据如下步骤获取公网调用地址和Token。

在模型在线服务（EAS）页面，找到部署的LLM智能路由服务。
单击服务名称进入概览页面，在基本信息区域单击查看调用信息。
在调用信息页面，复制服务独立流量入口下的公网调用地址和Token。

建议设置为环境变量，方便后续调用。命令示例如下：

export YOUR_GATEWAY_URL="https://*********3.cn-hangzhou.pai-eas.aliyuncs.com/api/predict/group_****y.ll****_gateway"
export YOUR_TOKEN="NzY4NWZ*************ZWU5Nw=="

使用流程

批量推理的完整流程包括以下4个步骤：

上传输入文件：将待处理的请求以JSONL格式上传到OSS。
创建批处理任务：调用API创建批处理任务。
任务执行：系统异步执行推理任务，可以随时查询任务状态。
获取结果：任务完成后，调用API或直接从OSS下载包含推理结果的输出文件。

1. 上传输入文件

创建一个名为input.jsonl的文件，每一行是一个独立的JSON对象，代表一个推理请求。示例如下：

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "Qwen3-8B", "messages": [{"role": "user", "content": "Hello world!"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "Qwen3-8B", "messages": [{"role": "user", "content": "Tell me a joke."}]}}

调用上传文件API将文件上传到部署LLM智能路由时配置的OSS存放路径。

curl -s "$YOUR_GATEWAY_URL/v1/files" \
  -H "Authorization: Bearer $YOUR_TOKEN" \
  -F purpose="batch" \
  -F file="@input.jsonl"

返回示例：

{
  "bytes": 564813,
  "created_at": 1765868482,
  "filename": "input.jsonl",
  "id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "object": "file",
  "purpose": "batch"
}

上传文件在OSS的路径为：oss://my-batch-bucket/batchllm/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b。

2. 使用文件id创建批处理任务

将<input_file_id>替换为步骤1返回结果中的id的值。

curl -s "$YOUR_GATEWAY_URL/v1/batches" \
  -H "Authorization: Bearer $YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "<input_file_id>",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

返回示例：

{
  "id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "model": "",
  "errors": null,
  "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "completion_window": "24h",
  "status": "pending",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1765868672,
  "in_progress_at": null,
  "expires_at": 1765955072,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 0,
    "completed": 0,
    "failed": 0
  },
  "usage": null,
  "metadata": null
}

3. 查询任务状态

将路径中{batch_id}替换为步骤2返回结果中的id的值。

curl -s "$YOUR_GATEWAY_URL/v1/batches/{batch_id}" \
  -H "Authorization: Bearer $YOUR_TOKEN" \
  -H "Content-Type: application/json"

返回示例：

{
  "id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "model": "",
  "errors": null,
  "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "completion_window": "24h",
  "status": "completed",
  "output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
  "error_file_id": null,
  "created_at": 1765868672,
  "in_progress_at": 1765868674,
  "expires_at": 1765955072,
  "finalizing_at": 1765868740,
  "completed_at": 1765868741,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 2160,
    "completed": 2160,
    "failed": 0
  },
  "usage": null,
  "metadata": {}
}

4. 获取任务结果

当查询到任务状态为completed时，可通过文件API获取到结果并输出到文件output.jsonl。

其中，路径中的{output_file_id}替换为任务状态结果中的output_file_id的值。

curl -s "$YOUR_GATEWAY_URL/v1/files/{output_file_id}" \
  -H "Authorization: Bearer $YOUR_TOKEN" > output.jsonl

output.jsonl内容示例如下。可以通过custom_id将结果与输入请求进行匹配。

{"id":"batch_xxx","custom_id":"request-1","response":{"status_code":200,"request_id":"req_id_1","body":{"id":"chatcmpl-xxx","object":"chat.completion","model":"Qwen3-8B", "choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"\u003cthink\u003e\nOkay, the user said \"Hello world!\" xxxxxxx What would you like to do?","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":11,"total_tokens":231,"completion_tokens":220,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}}}
{"id":"batch_xxx","custom_id":"request-2","response":{"status_code":200,"request_id":"req_id_2","body":{"id":"chatcmpl-yyy","object":"chat.completion","model":"Qwen3-8B", "choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"\u003cthink\u003e\nOkay, xxxxxxx Let me know if you want another!","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":13,"total_tokens":595,"completion_tokens":582,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}}}

使用脚本执行批处理任务

将以下脚本保存为run_batch.sh，并替换其中的<YOUR_GATEWAY_URL>和<YOUR_TOKEN>。

#!/bin/bash

# 替换为您的服务地址和Token
GATEWAY_URL="<YOUR_GATEWAY_URL>"
TOKEN="<YOUR_TOKEN>"

# 1. 上传输入文件
echo "Uploading input file..."
UPLOAD_RESPONSE=$(curl -s "${GATEWAY_URL}/v1/files" \
  -H "Authorization: Bearer ${TOKEN}" \
  -F purpose="batch" \
  -F file="@input.jsonl")

INPUT_FILE_ID=$(echo ${UPLOAD_RESPONSE} | grep -o '"id": *"[^"]*"' | cut -d'"' -f4)
if [ -z "$INPUT_FILE_ID" ]; then
  echo "Failed to upload file. Response: $UPLOAD_RESPONSE"
  exit 1
fi
echo "Input file uploaded. File ID: ${INPUT_FILE_ID}"

# 2. 创建批处理任务
echo "Creating batch job..."
CREATE_RESPONSE=$(curl -s -X POST "${GATEWAY_URL}/v1/batches" \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d "{
    \"input_file_id\": \"${INPUT_FILE_ID}\",
    \"endpoint\": \"/v1/chat/completions\",
    \"completion_window\": \"24h\"
  }")

BATCH_ID=$(echo ${CREATE_RESPONSE} | grep -o '"id": *"[^"]*"' | cut -d'"' -f4)
if [ -z "$BATCH_ID" ]; then
  echo "Failed to create batch job. Response: $CREATE_RESPONSE"
  exit 1
fi
echo "Batch job created. Batch ID: ${BATCH_ID}"

# 3. 轮询任务状态直到完成
echo "Polling batch status..."
while true; do
  STATUS_RESPONSE=$(curl -s "${GATEWAY_URL}/v1/batches/${BATCH_ID}" \
    -H "Authorization: Bearer ${TOKEN}")
  
  STATUS=$(echo ${STATUS_RESPONSE} | grep -o '"status": *"[^"]*"' | cut -d'"' -f4)
  echo "Current status: ${STATUS}"

  if [[ "$STATUS" == "completed" ]]; then
    OUTPUT_FILE_ID=$(echo ${STATUS_RESPONSE} | grep -o '"output_file_id": *"[^"]*"' | cut -d'"' -f4)
    echo "Batch job completed. Output file ID: ${OUTPUT_FILE_ID}"
    break
  elif [[ "$STATUS" == "failed" || "$STATUS" == "expired" || "$STATUS" == "cancelled" ]]; then
    echo "Batch job ended with status: ${STATUS}. Error details: ${STATUS_RESPONSE}"
    exit 1
  fi
  sleep 10
done

# 4. 下载并查看结果文件
echo "Downloading result file..."
curl -s "${GATEWAY_URL}/v1/files/${OUTPUT_FILE_ID}/content" \
  -H "Authorization: Bearer ${TOKEN}" > output.jsonl

echo "Result downloaded to output.jsonl. Content:"
cat output.jsonl

执行脚本：

bash run_batch.sh

批处理任务状态

一个批处理任务会经历多个状态，以下是完整的状态列表和说明。

状态值	阶段	说明	可操作性
`pending`	准备	待处理：任务已创建，正在等待系统调度处理。	可取消
`validating`	准备	正在验证：正在验证输入文件格式及参数。若验证失败，任务将进入`failed`状态。	可取消
`in_progress`	处理	处理中：任务正在执行，向后端推理服务发送请求。	可取消
`finalizing`	结束	结束中：所有请求分片已处理完毕，正在等待聚合结果。	不可操作
`finalize`		最终确定中：正在聚合结果并生成最终的输出文件。	不可操作
`completed`		已完成：任务成功完成，输出文件可供下载。	不可操作
`failed`		验证失败：任务在`validating`阶段验证失败，未开始处理。	不可操作
`cancelling`		取消中：已收到取消请求，系统正在停止处理中的请求。	不可操作
`cancelled`		已取消：任务被成功取消。	不可操作
`expired`		已过期：任务未在`completion_window`指定的时间内完成，被系统自动终止。	不可操作

API参考

与OpenAI的Batch API基本保持一致，具体支持的 API 如下。

批处理（Batch）API

Batch对象结构

参数	类型	说明
id	string	批处理任务的唯一标识符。
object	string	对象的类型，目前始终为 `batch`。
endpoint	string	批处理任务要调用的 API 端点（例如：/v1/chat/completions）。目前仅支持： /v1/responses /v1/chat/completions /v1/embeddings /v1/completions
model	string	批处理请求中使用的模型。目前为空。
errors	object	任务失败时的错误详情（仅状态为`failed` 时存在）。
errors.data	array	validate产生的错误信息。
errors.data[].code	string	错误代码。
errors.data[].line	int	暂不支持，始终为0.
errors.data[].message	string	错误信息。
errors.data[].param	string	暂不支持，始终为空。
input_file_id	string	包含输入请求的文件的 ID（jsonl 格式）。
completion_window	string	任务请求完成的时间窗口（例如：24h）。
status	string	批处理任务的当前状态（例如：validating, in_progress, completed）。
output_file_id	string	包含处理后响应的文件的 ID（仅在任务成功完成时存在）。
error_file_id	string	包含处理失败请求的文件的 ID。
created_at	integer	任务的创建时间（Unix 时间戳，单位秒）。
in_progress_at	integer	任务开始处理的时间（Unix 时间戳，单位秒）。
expires_at	integer	设置的任务过期时间，在此时间后还未完成的任务将进入超时状态（Unix 时间戳，单位秒）。
finalizing_at	integer	任务进入最终确定阶段的时间（Unix 时间戳，单位秒）。
completed_at	integer	任务成功完成的时间（Unix 时间戳，单位秒）。
failed_at	integer	任务失败的时间（Unix 时间戳，单位秒）。
expired_at	integer	任务过期的时间（Unix 时间戳，单位秒）。
cancelling_at	integer	任务进入取消阶段的时间（Unix 时间戳，单位秒）。
cancelled_at	integer	任务真正被取消的时间（Unix 时间戳，单位秒）。
request_counts	object	包含任务中请求数量的统计信息（总数、已完成、失败等）。
request_counts.total	integer	请求总数。
request_counts.compelted	integer	请求成功数。
request_counts.failed	integer	请求失败数。
usage	object	暂不支持。任务消耗的用量信息（例如：token 计数）。
metadata	map	用户提供的可选键值对元数据。

创建批处理任务：POST /v1/batches

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/batches" \
  -H "Authorization: Bearer <YOUR_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
  }'

返回示例

{
  "id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "model": "",
  "errors": null,
  "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "completion_window": "24h",
  "status": "completed",
  "output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
  "error_file_id": null,
  "created_at": 1765868672,
  "in_progress_at": 1765868674,
  "expires_at": 1765955072,
  "finalizing_at": 1765868740,
  "completed_at": 1765868741,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 2160,
    "completed": 2160,
    "failed": 0
  },
  "usage": null,
  "metadata": {}
}

输入参数

参数	类型	是否必填	说明
input_file_id	string	是	已上传的文件id，文件格式必须是合法的jsonl。最多包含50000个request并且文件大小最大为200MB。
endpoint	string	是	批处理请求要调用的API endpoint。
completion_window	string	否	批处理任务完成的时间窗口。目前仅支持24h。
metadata	map	否	批处理任务的元数据信息。最多包含16个键值对，其中key最长支持16个字符，value最长支持512字符。
output_expires_after	object	否	暂不支持，所有文件不会自动过期清理。Batch的输出文件或错误文件的过期策略。

返回参数

返回创建的Batch对象

查询batch信息：GET /v1/batches/{batch_id}

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/batches/batch_98d4d6e3-c7ec-4aa9-969e-fb8531059523" \
  -H "Authorization: Bearer <YOUR_TOKEN>" \
  -H "Content-Type: application/json"

返回示例

{
  "id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "model": "",
  "errors": null,
  "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "completion_window": "24h",
  "status": "completed",
  "output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
  "error_file_id": null,
  "created_at": 1765868672,
  "in_progress_at": 1765868674,
  "expires_at": 1765955072,
  "finalizing_at": 1765868740,
  "completed_at": 1765868741,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": null,
  "cancelled_at": null,
  "request_counts": {
    "total": 2160,
    "completed": 2160,
    "failed": 0
  },
  "usage": null,
  "metadata": {}
}

输入参数

参数	类型	是否必填	说明
batch_id	string	是	需要查询的批处理任务id。

返回参数

返回查询的Batch对象

取消批处理任务：POST /v1/batches/{batch_id}/cancel

只有处于validating和in_progress状态的任务可以被取消。

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/batches/batch_98d4d6e3-c7ec-4aa9-969e-fb8531059523/cancel" \
  -H "Authorization: Bearer <YOUR_TOKEN>"
  -X POST

返回示例

{
  "id": "batch_93559b00-67bf-4615-895e-16fd30196ecb",
  "object": "batch",
  "endpoint": "/v1/chat/completions",
  "model": "",
  "errors": null,
  "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "completion_window": "24h",
  "status": "cancelling",
  "output_file_id": null,
  "error_file_id": null,
  "created_at": 1765870619,
  "in_progress_at": 1765870620,
  "expires_at": 1765957019,
  "finalizing_at": null,
  "completed_at": null,
  "failed_at": null,
  "expired_at": null,
  "cancelling_at": 1765870629,
  "cancelled_at": null,
  "request_counts": {
    "total": 2160,
    "completed": 0,
    "failed": 0
  },
  "usage": null,
  "metadata": {}
}

输入参数

参数	类型	是否必填	说明
batch_id	string	是	需要取消的批处理任务id。

返回参数

返回取消的Batch对象

列出batch：GET /v1/batches

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/batches/batch_98d4d6e3-c7ec-4aa9-969e-fb8531059523" \
  -H "Authorization: Bearer <YOUR_TOKEN>"

返回示例

{
  "data": [
    {
      "id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
      "object": "batch",
      "endpoint": "/v1/chat/completions",
      "model": "",
      "errors": null,
      "input_file_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
      "completion_window": "24h",
      "status": "completed",
      "output_file_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
      "error_file_id": null,
      "created_at": 1765868672,
      "in_progress_at": 1765868674,
      "expires_at": 1765955072,
      "finalizing_at": 1765868740,
      "completed_at": 1765868741,
      "failed_at": null,
      "expired_at": null,
      "cancelling_at": null,
      "cancelled_at": null,
      "request_counts": {
        "total": 2160,
        "completed": 2160,
        "failed": 0
      },
      "usage": null,
      "metadata": {}
    }
  ],
  "first_id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
  "has_more": false,
  "last_id": "batch_5f968571-b0b6-413f-a2a8-69bf750112af",
  "object": "list"
}

输入参数

参数	类型	是否必填	说明
limit	integer	否	要返回的最大结果数。
after	string	否	上次请求的最后一个batch_id。

返回参数

参数	类型	说明
object	string	始终为"list"。
data	array	Batch对象组成的list。以创建时间逆序排列（最新的在最前面）。
first_id	string	此次响应的第一个batch_id。
last_id	string	此次响应的最后一个batch_id。
has_more	bool	最后batch_id之后是否还有。

文件API

文件对象结构

参数	类型	说明
id	string	文件的唯一标识符。
object	string	对象的类型，始终为"file"。
bytes	integer	文件的大小（以字节为单位）。
created_at	integer	文件创建的时间（Unix 时间戳，单位秒）。
expires_at	integer	文件将过期的时间（Unix 时间戳，单位秒）。
filename	string	用户上传时指定的文件名。
purpose	string	文件的用途，始终为"batch"。

输入文件格式（JSONL）

输入文件必须为.jsonl格式，每行一个JSON对象。

参数	类型	是否必填	说明
custom_id	string	是	自定义请求id，必须保证每条请求唯一。
method	string	是	请求推理服务的方法，一般为POST。
url	string	是	请求推理服务的endpoint，目前必须跟创建批处理任务时指定的endpoint保持一致。
body	object	是	请求推理服务的body，会不做改动直接发往推理服务。

输出文件格式（JSONL）

输出文件同样为.jsonl格式，每行一个JSON对象，不保证与输入文件顺序一致。

说明您需要使用custom_id来将输出文件中的每一行与输入文件中的原始请求进行匹配。

参数	类型	说明
id	string	batch_id。
custom_id	string	自定义请求id。
response	object	请求的响应。
response.status_code	int	推理服务返回的http状态码。
response.request_id	string	推理请求id。
response.body	object	推理服务返回的body。
error	object	错误信息，有错误的时候存在。错误的原因可能是推理服务不可用，请求的格式不合法以及LLM Gateway内部错误。
error.code	string	具体的错误代码。
error.message	string	具体的错误信息。

上传文件：POST /v1/files

目的：上传一个包含批量请求的jsonl文件，以供后续创建批处理任务使用。

请求示例

curl "<YOUR_GATEWAY_URL>/v1/files" \
  -H "Authorization: Bearer <YOUR_TOKEN>" \
  -F purpose="batch" \
  -F file="@input.jsonl"

返回示例

{
  "id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "object": "file",
  "bytes": 564813,
  "created_at": 1765868482,
  "filename": "input.jsonl",
  "purpose": "batch"
}

输入参数

参数名	类型	是否必填	说明
file	File	是	要上传的文件。（multipart/form-data）
purpose	string	是	文件的用途，只能为"batch"。（multipart/form-data）

返回参数

返回创建的文件对象。

列出文件：`GET /v1/files`

目的：列出所有已上传的文件元数据。

请求示例

curl "<YOUR_GATEWAY_URL>/v1/files" \
  -H "Authorization: Bearer <YOUR_TOKEN>"

返回示例

{
  "data": [
    {
      "id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
      "object": "file",
      "bytes": 1740241,
      "created_at": 1765868741,
      "expires_at": 0,
      "filename": "output.jsonl",
      "purpose": "batch_output"
    },
    {
      "id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
      "object": "file",
      "bytes": 564813,
      "created_at": 1765868482,
      "expires_at": 0,
      "filename": "1.jsonl",
      "purpose": "batch"
    }
  ],
  "first_id": "batch_output_20bafd19-6d73-4b61-a770-ebcb377d286d",
  "has_more": false,
  "last_id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "object": "list"
}

输入参数

无

返回参数

参数	类型	说明
object	string	始终为"list"。
data	array	文件对象组成的list。以创建时间逆序排列（最新的在最前面）。
first_id	string	此次响应的第一个file_id。
last_id	string	此次响应的最后一个file_id。

查询文件：`GET /v1/files/{file_id}`

目的：下载指定文件的内容，例如获取已完成任务的结果文件。

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b" \
  -H "Authorization: Bearer <YOUR_TOKEN>"

返回示例

{
  "id": "batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b",
  "object": "file",
  "bytes": 564813,
  "created_at": 1765868482,
  "expires_at": 0,
  "filename": "intput.jsonl",
  "purpose": "batch"
}

输入参数

参数	类型	是否必填	说明
file_id	string	是	需要查询的文件id。

输出参数

返回查询的文件对象。

删除文件：`DELETE /v1/files/{file_id}`

目的：删除文件的元数据记录。

警告：文件删除操作仅移除元数据 调用此接口不会删除存储在您OSS Bucket中的物理文件。此操作仅删除EAS LLM Gateway中的文件记录。如不手动或通过生命周期规则清理，OSS上的文件将永久保留并持续产生存储费用。

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b" \
  -H "Authorization: Bearer <YOUR_TOKEN>"
  -X DELETE

返回示例

{
  "deleted": true,
  "id": "batch_output_a31a8f26-3abe-4522-9e3f-5c845fa56af7",
  "object": "file"
}

输入参数

参数	类型	是否必填	说明
file_id	string	是	需要删除的文件id。

返回参数

参数	类型	说明
id	string	文件id。
object	string	文件类型，目前始终为"file"。
deleted	bool	true

获取文件内容：GET /v1/files/{file_id}/content

请求示例

curl -s "<YOUR_GATEWAY_URL>/v1/files/batch_input_11fb297e-653d-47cf-bb6a-a80209dc562b" \
  -H "Authorization: Bearer <YOUR_TOKEN>"

返回示例

{"id":"batch_5f968571-b0b6-413f-a2a8-69bf750112af","custom_id":"request-1","response":{"status_code":200,"request_id":"282f82b5-577a-44f3-9bf7-a17522ac7d1c","body":{"id":"chatcmpl-282f82b5-577a-44f3-9bf7-a17522ac7d1c","object":"chat.completion","created":1765868675,"model":"Qwen3-VL-2B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":22,"total_tokens":32,"completion_tokens":10,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}}}
...

输入参数

参数	类型	是否必填	说明
file_id	string	是	需要获取内容的文件id。

返回参数

返回文件内容。

最佳实践

性能调优

通过调整服务部署时的batch_options参数，可以优化任务处理性能。

--batch-parallel：并发度。建议值与服务实例数（instance）和单个实例的并发处理能力相关。初次使用可从 instance数 * 2 开始尝试，并根据后端推理服务的CPU/GPU负载情况进行调整。
--batch-lines-per-shard：每分片行数。该参数主要用于计算分片数（即输入文件总行数除以每分片行数后向上取整）。分片数较多有助于均衡不同并发请求的负载，但会导致中间文件更碎片化，并增加 OSS API 调用次数。建议根据输入文件总行数设置该值，使最终分片数为并发度的整数倍，以充分提升资源利用率。推荐取值范围为 500–2000。

成本管理

配置OSS生命周期规则：这是最重要的成本管理措施。请务必为存储批量任务文件的OSS Bucket或前缀配置生命周期规则，定期删除旧的输入、输出和错误文件，避免不必要的存储费用。
利用闲时资源：在业务低峰期（如夜间）运行计算密集型的批量任务，可以充分利用闲置的GPU资源，实现成本效益最大化。

任务管理

拆分超大任务：对于包含数百万级请求的超大规模任务，建议将其拆分为多个较小的批处理任务。这有助于隔离失败、简化管理和提高重试效率。

常见问题

Q1: 任务长时间处于 pending 或 validating 状态怎么办？

检查EAS服务实例的状态是否正常，资源（CPU/内存/GPU）是否充足。
检查输入文件是否已成功上传到OSS对应的路径。
检查服务的RAM角色是否配置正确，以及该角色是否具有对batch_oss_path的oss:GetObject和oss:ListObjects权限。

Q2: 任务状态变为 failed，如何排查？

任务进入failed状态通常意味着在validating阶段出错。

调用GET /v1/batches/{batch_id}接口，查看返回的errors字段获取失败原因。
常见原因包括：输入文件不是合法的JSONL格式、文件内JSON对象缺少必填字段（如custom_id）、url字段与创建任务时指定的endpoint不匹配。

Q3: 任务 completed，但部分请求失败了，如何处理？

下载output_file_id对应的结果文件。
遍历结果文件，筛选出包含error字段的行，这些即为失败的请求。
您可以根据custom_id和error信息，将这些失败的请求收集起来，放入一个新的输入文件，创建新的批处理任务进行重试。

Q4: 如何判断是OSS权限问题？

如果任务在validating阶段失败，且错误信息与文件读取相关，可能是缺少oss:GetObject权限。
如果任务长时间卡在finalizing或finalize阶段，最终超时expired，可能是缺少向OSS写入结果文件的oss:PutObject权限。

功能特点

服务部署

核心参数说明

部署方式

单独部署

统一部署

获取访问凭证

使用流程

1. 上传输入文件

2. 使用文件id创建批处理任务

3. 查询任务状态

4. 获取任务结果

使用脚本执行批处理任务

批处理任务状态

API参考

批处理（Batch）API

Batch对象结构

创建批处理任务：POST /v1/batches

查询batch信息：GET /v1/batches/{batch_id}

取消批处理任务：POST /v1/batches/{batch_id}/cancel

列出batch：GET /v1/batches

文件API

文件对象结构

输入文件格式（JSONL）

输出文件格式（JSONL）

上传文件：POST /v1/files

列出文件：GET /v1/files

查询文件：GET /v1/files/{file_id}

删除文件：DELETE /v1/files/{file_id}

获取文件内容：GET /v1/files/{file_id}/content

最佳实践

性能调优

成本管理

任务管理

常见问题

列出文件：`GET /v1/files`

查询文件：`GET /v1/files/{file_id}`

删除文件：`DELETE /v1/files/{file_id}`