Connect your custom large language models (LLMs) to real-time workflows by using a standard protocol.
Standard interface for self-developed LLMs (OpenAI specification)
If your LLM interface follows the OpenAI standard, you can integrate your custom LLM service into a workflow by using the OpenAI configuration. Only streaming requests are supported.
-
In the LLM node, select Self-developed access (OpenAI specification) and configure the following parameters:
|
Name |
Type |
Required |
Description |
Example value |
|
ModelId |
String |
Yes |
The model name. Corresponds to the OpenAI 'model' field. |
abc |
|
API-KEY |
String |
Yes |
The API key for authentication. Corresponds to the OpenAI 'api_key' field. |
AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI |
|
Target model HTTPS address |
String |
Yes |
The target service URL. Corresponds to the OpenAI 'base_url' field. Note
In compliance with the OpenAI specification, Alibaba Cloud automatically appends /chat/completions to the base_url. Make sure your path matches this format. |
http://www.abc.com |
-
At runtime, the workflow assembles data in the OpenAI format and sends a POST request to your configured model endpoint. The input parameters are as follows:
|
Name |
Type |
Required |
Description |
Example value |
|
messages |
Array |
Yes |
The conversation history. Up to 20 records are kept. Earlier entries in the array represent older messages. Note
The system automatically combines the user's current input with the conversation history and sends it to the LLM. |
[{'role': 'user', 'content': 'What is the weather like today?'},{'role': 'assistant', 'content': 'The weather is sunny today.'},{'role': 'user', 'content': 'What about the weather tomorrow?'}] |
|
model |
String |
Yes |
The model name. |
abc |
|
stream |
Boolean |
Yes |
Whether to stream the response. Only streaming is supported. |
True |
|
extendData |
Object |
Yes |
Additional information. |
{'instanceId':'68e00b6640e*****3e943332fee7','channelId':'123','sentenceId':'3','userData':'{"aaaa":"bbbb"}'} |
|
String |
Yes |
The instance ID. |
68e00b6640e*****3e943332fee7 |
|
String |
Yes |
The channel ID. |
123 |
|
Int |
Yes |
The Q&A pair ID. Note
For the same user question, the agent's response uses the same ID. |
3 |
|
String |
No |
The calling number in a phone call. |
13800000001 |
|
String |
No |
The called number in a phone call. |
13800000002 |
|
String |
No |
The business data passed in the UserData field when starting the instance. |
{"aaaa":"bbbb"} |
Custom LLM (OpenAI specification) server
Python
import json
import time
from loguru import logger
from flask import Flask, request, jsonify, Response
app = Flask(__name__)
API_KEY = "YOURAPIKEY"
@app.route('/v1/chat/completions', methods=['POST'])
def chat_completion():
# Check the API key
auth_header = request.headers.get('Authorization')
if not auth_header or auth_header.split()[1] != API_KEY:
return jsonify({"error": "Unauthorized"}), 401
data = request.json
logger.info(f"data is {data}")
task_id = request.args.get('task_id')
room_id = request.args.get('room_id')
for header, value in request.headers.items():
logger.info(f"{header}: {value}")
# Print query parameters
logger.info("\nQuery Parameters:")
for key, value in request.args.items():
logger.info(f"{key}: {value}")
logger.info(f"task_id: {task_id}, room_id: {room_id}")
stream = data.get('stream', False)
if stream:
return Response(generate_stream_response(data), content_type='text/event-stream')
else:
return jsonify(generate_response(data))
def generate_response(data):
response = "This is a mock AI assistant response. In a real application, call a real AI model here."
return {
"id": "chatcmpl-123",
"object": "chat.completion",
"created": int(time.time()),
"model": data['model'],
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": response
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": sum(len(m['content']) for m in data['messages']),
"completion_tokens": len(response),
"total_tokens": sum(len(m['content']) for m in data['messages']) + len(response)
}
}
def generate_stream_response(data):
response = "This is a mock AI assistant streaming response. In a real application, call a real AI model here."
words = list(response)
for i, word in enumerate(words):
chunk = {
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": data['model'],
"choices": [{
"index": 0,
"delta": {
"content": word,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "hangup",
"arguments": "\{\}"
}
}
]
},
"finish_reason": None if i < len(words) - 1 else "stop"
}]
}
logger.info(chunk)
yield f"data: {json.dumps(chunk)}\n\n"
time.sleep(0.1) # Simulate processing time
yield "data: [DONE]\n\n"
if __name__ == '__main__':
logger.info(f"Server is running with API_KEY: {API_KEY}")
app.run(port=8083, debug=True)