LLM standard interface

更新时间:
复制 MD 格式

Connect your custom large language models (LLMs) to real-time workflows by using a standard protocol.

Standard interface for self-developed LLMs (OpenAI specification)

If your LLM interface follows the OpenAI standard, you can integrate your custom LLM service into a workflow by using the OpenAI configuration. Only streaming requests are supported.

  1. In the LLM node, select Self-developed access (OpenAI specification) and configure the following parameters:

Name

Type

Required

Description

Example value

ModelId

String

Yes

The model name. Corresponds to the OpenAI 'model' field.

abc

API-KEY

String

Yes

The API key for authentication. Corresponds to the OpenAI 'api_key' field.

AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI

Target model HTTPS address

String

Yes

The target service URL. Corresponds to the OpenAI 'base_url' field.

Note

In compliance with the OpenAI specification, Alibaba Cloud automatically appends /chat/completions to the base_url. Make sure your path matches this format.

http://www.abc.com

  1. At runtime, the workflow assembles data in the OpenAI format and sends a POST request to your configured model endpoint. The input parameters are as follows:

Name

Type

Required

Description

Example value

messages

Array

Yes

The conversation history. Up to 20 records are kept. Earlier entries in the array represent older messages.

Note

The system automatically combines the user's current input with the conversation history and sends it to the LLM.

[{'role': 'user', 'content': 'What is the weather like today?'},{'role': 'assistant', 'content': 'The weather is sunny today.'},{'role': 'user', 'content': 'What about the weather tomorrow?'}]

model

String

Yes

The model name.

abc

stream

Boolean

Yes

Whether to stream the response. Only streaming is supported.

True

extendData

Object

Yes

Additional information.

{'instanceId':'68e00b6640e*****3e943332fee7','channelId':'123','sentenceId':'3','userData':'{"aaaa":"bbbb"}'}

  • instanceId

String

Yes

The instance ID.

68e00b6640e*****3e943332fee7

  • channelId

String

Yes

The channel ID.

123

  • sentenceId

Int

Yes

The Q&A pair ID.

Note

For the same user question, the agent's response uses the same ID.

3

  • callerNumber

String

No

The calling number in a phone call.

13800000001

  • calleeNumber

String

No

The called number in a phone call.

13800000002

  • userData

String

No

The business data passed in the UserData field when starting the instance.

{"aaaa":"bbbb"}

Custom LLM (OpenAI specification) server

Python

import json
import time
from loguru import logger
from flask import Flask, request, jsonify, Response

app = Flask(__name__)

API_KEY = "YOURAPIKEY"

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completion():
    # Check the API key
    auth_header = request.headers.get('Authorization')
    if not auth_header or auth_header.split()[1] != API_KEY:
        return jsonify({"error": "Unauthorized"}), 401

    data = request.json
    logger.info(f"data is {data}")
    task_id = request.args.get('task_id')
    room_id = request.args.get('room_id')
    for header, value in request.headers.items():
        logger.info(f"{header}: {value}")

    # Print query parameters
    logger.info("\nQuery Parameters:")
    for key, value in request.args.items():
        logger.info(f"{key}: {value}")

    logger.info(f"task_id: {task_id}, room_id: {room_id}")
    stream = data.get('stream', False)

    if stream:
        return Response(generate_stream_response(data), content_type='text/event-stream')
    else:
        return jsonify(generate_response(data))

def generate_response(data):
    response = "This is a mock AI assistant response. In a real application, call a real AI model here."

    return {
        "id": "chatcmpl-123",
        "object": "chat.completion",
        "created": int(time.time()),
        "model": data['model'],
        "choices": [{
            "index": 0,
            "message": {
                "role": "assistant",
                "content": response
            },
            "finish_reason": "stop"
        }],
        "usage": {
            "prompt_tokens": sum(len(m['content']) for m in data['messages']),
            "completion_tokens": len(response),
            "total_tokens": sum(len(m['content']) for m in data['messages']) + len(response)
        }
    }

def generate_stream_response(data):
    response = "This is a mock AI assistant streaming response. In a real application, call a real AI model here."
    words = list(response)
    for i, word in enumerate(words):
        chunk = {
            "id": "chatcmpl-123",
            "object": "chat.completion.chunk",
            "created": int(time.time()),
            "model": data['model'],
            "choices": [{
                "index": 0,
                "delta": {
                    "content": word, 
                    "tool_calls": [  
                        {
                            "id": "call_abc123",  
                            "type": "function",
                            "function": {
                                "name": "hangup", 
                                "arguments": "\{\}"  
                            }
                        }
                    ]
                },
                "finish_reason": None if i < len(words) - 1 else "stop"
            }]
        }
        logger.info(chunk)
        yield f"data: {json.dumps(chunk)}\n\n"
        time.sleep(0.1)  # Simulate processing time

    yield "data: [DONE]\n\n"

if __name__ == '__main__':
    logger.info(f"Server is running with API_KEY: {API_KEY}")
    app.run(port=8083, debug=True)