LLM standard interface-Intelligent Media Services(IMS)-阿里云帮助中心

Connect your custom large language models (LLMs) to real-time workflows by using a standard protocol.

Standard interface for self-developed LLMs (OpenAI specification)

If your LLM interface follows the OpenAI standard, you can integrate your custom LLM service into a workflow by using the OpenAI configuration. Only streaming requests are supported.

In the LLM node, select Self-developed access (OpenAI specification) and configure the following parameters:

Name	Type	Required	Description	Example value
ModelId	String	Yes	The model name. Corresponds to the OpenAI 'model' field.	abc
API-KEY	String	Yes	The API key for authentication. Corresponds to the OpenAI 'api_key' field.	AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI
Target model HTTPS address	String	Yes	The target service URL. Corresponds to the OpenAI 'base_url' field. Note In compliance with the OpenAI specification, Alibaba Cloud automatically appends /chat/completions to the base_url. Make sure your path matches this format.	http://www.abc.com

At runtime, the workflow assembles data in the OpenAI format and sends a POST request to your configured model endpoint. The input parameters are as follows:

Name	Type	Required	Description	Example value
messages	Array	Yes	The conversation history. Up to 20 records are kept. Earlier entries in the array represent older messages. Note The system automatically combines the user's current input with the conversation history and sends it to the LLM.	[{'role': 'user', 'content': 'What is the weather like today?'},{'role': 'assistant', 'content': 'The weather is sunny today.'},{'role': 'user', 'content': 'What about the weather tomorrow?'}]
model	String	Yes	The model name.	abc
stream	Boolean	Yes	Whether to stream the response. Only streaming is supported.	True
extendData	Object	Yes	Additional information.	{'instanceId':'68e00b6640e*****3e943332fee7','channelId':'123','sentenceId':'3','userData':'{"aaaa":"bbbb"}'}
instanceId	String	Yes	The instance ID.	68e00b6640e*****3e943332fee7
channelId	String	Yes	The channel ID.	123
sentenceId	Int	Yes	The Q&A pair ID. Note For the same user question, the agent's response uses the same ID.	3
callerNumber	String	No	The calling number in a phone call.	13800000001
calleeNumber	String	No	The called number in a phone call.	13800000002
userData	String	No	The business data passed in the UserData field when starting the instance.	{"aaaa":"bbbb"}

Custom LLM (OpenAI specification) server

Python

import json
import time
from loguru import logger
from flask import Flask, request, jsonify, Response

app = Flask(__name__)

API_KEY = "YOURAPIKEY"

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completion():
    # Check the API key
    auth_header = request.headers.get('Authorization')
    if not auth_header or auth_header.split()[1] != API_KEY:
        return jsonify({"error": "Unauthorized"}), 401

    data = request.json
    logger.info(f"data is {data}")
    task_id = request.args.get('task_id')
    room_id = request.args.get('room_id')
    for header, value in request.headers.items():
        logger.info(f"{header}: {value}")

    # Print query parameters
    logger.info("\nQuery Parameters:")
    for key, value in request.args.items():
        logger.info(f"{key}: {value}")

    logger.info(f"task_id: {task_id}, room_id: {room_id}")
    stream = data.get('stream', False)

    if stream:
        return Response(generate_stream_response(data), content_type='text/event-stream')
    else:
        return jsonify(generate_response(data))

def generate_response(data):
    response = "This is a mock AI assistant response. In a real application, call a real AI model here."

    return {
        "id": "chatcmpl-123",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": data['model'],
        "choices": [{
            "index": 0,
            "message": {
                "role": "assistant",
                "content": response
            },
            "finish_reason": "stop"
        }],
        "usage": {
            "prompt_tokens": sum(len(m['content']) for m in data['messages']),
            "completion_tokens": len(response),
            "total_tokens": sum(len(m['content']) for m in data['messages']) + len(response)
        }
    }

def generate_stream_response(data):
    response = "This is a mock AI assistant streaming response. In a real application, call a real AI model here."
    words = list(response)
    for i, word in enumerate(words):
        chunk = {
            "id": "chatcmpl-123",
            "object": "chat.completion.chunk",
            "created": int(time.time()),
            "model": data['model'],
            "choices": [{
                "index": 0,
                "delta": {
                    "content": word, 
                    "tool_calls": [  
                        {
                            "id": "call_abc123",  
                            "type": "function",
                            "function": {
                                "name": "hangup", 
                                "arguments": "\{\}"  
                            }
                        }
                    ]
                },
                "finish_reason": None if i < len(words) - 1 else "stop"
            }]
        }
        logger.info(chunk)
        yield f"data: {json.dumps(chunk)}\n\n"
        time.sleep(0.1)  # Simulate processing time

    yield "data: [DONE]\n\n"

if __name__ == '__main__':
    logger.info(f"Server is running with API_KEY: {API_KEY}")
    app.run(port=8083, debug=True)