Deploy an application flow

更新时间:
复制 MD 格式

After you develop an application flow, you can deploy it as an Elastic Algorithm Service (EAS) service. EAS offers features such as automatic scaling and comprehensive Service Monitoring. These features help your application adapt to business changes, improve system stability and performance, and meet the demands of a production environment.

Prerequisites

You have created and debugged an application flow. For more information, see Develop an application flow.

Deploy an application flow

Go to LangStudio and select a workspace. On the Application Flow tab, click your debugged application flow, and then click Deploy in the upper-right corner. You can deploy the application flow only if the runtime is started. The following table describes the key parameters.

image

Parameter

Description

Resource deployment

Resource type

Select a public resource group or a dedicated resource group that you created.

Instances

Configure the number of service instances. In a production environment, configure multiple service instances to reduce the risk of a single point of failure.

Deployment resources

If you use the application flow only for business flow scheduling, select appropriate CPU resources based on the complexity of the business flow. Compared with GPU resources, CPU resources are usually more cost-effective. After deployment, you are charged a resource fee for EAS. For more information about billing, see Billing of Elastic Algorithm Service (EAS).

Virtual Private Cloud (VPC): An application flow is deployed as an EAS service. To ensure that clients can access the online EAS service after deployment, select a VPC to connect the client to the service. Note that EAS services cannot access the public network by default. If your EAS service needs to access the public network, configure a VPC with public network access capabilities. For more information, see Access public or private resources from EAS.

Note

If the application flow includes a vector database connection, such as Milvus, ensure that the configured VPC is the same as the VPC where the vector database instance resides, or ensure that the two networks are connected.

Chat history

Enable chat history

This parameter applies only to chat-based application flows. When enabled, the service can store and transmit the history of multi-turn conversations. This feature must be used with the service request header parameter.

Chat history storage

Local storage does not support multi-instance deployment. If you deploy the service for production use, use external storage, such as ApsaraDB RDS. For more information, see Appendix: Chat history.

Important

If you use local storage, multi-instance deployment is not supported. Scaling out from a single instance to multiple instances is also not supported. Otherwise, the chat history feature may not work correctly.

Tracing Analysis: When enabled, you can view trace details after the service is deployed to evaluate the performance of the application flow.

Roles and permissions: In the application flow, if you use a Faiss vector database (you must select a Faiss or Milvus vector database for knowledge base management) or the "Alibaba Cloud IQS-Standard Search" component (used by the IQS Web Search Chat Assistant template), select a role as needed.

For more information about parameter settings, see Custom deployment.

Online debugging

Call the service

Online debugging

After the service is successfully deployed, you are redirected to the PAI-EAS console. On the Online Debugging tab, you can configure and send a request. The key in the request body must be the same as the value of the "Conversation Input" parameter in the Start node of the application flow. This topic uses the default field question.

image

API calls

  1. On the Overview tab, obtain the service endpoint and token.

    image

  2. Send an API request.

    You can call the service in basic mode or complete mode. The following table describes the differences.

    Property

    Basic mode

    Complete mode

    Request path

    <Endpoint>/

    <Endpoint>/run

    Description

    Directly returns the output of the application flow.

    Returns a complex structure that includes the node status, error messages, and output messages of the application flow.

    Scenarios

    • You need only the final output of the application flow and do not care about the internal processing or status.

    • Suitable for simple queries or operations to quickly obtain results.

    • You need to understand the execution process of the application flow in detail, including the status of each node and possible error messages.

    • Suitable for debugging, monitoring, or analyzing the execution of the application flow.

    Advantages

    Simple to use. You do not need to parse complex structures.

    • Provides comprehensive information to help you understand the execution process of the application flow in depth.

    • Helps troubleshoot and optimize the performance of the application flow.

    Basic mode

    cURL command

    Application flow services deployed in EAS support streaming and non-streaming calls using cURL commands. The following are request and response examples:

    Example type

    Streaming

    Non-streaming

    Sample request

    curl -X POST \
         -H "Authorization: Bearer <your_token>" \
         -H "Content-Type: application/json" \
         -H "Accept: text/event-stream" \
         -d '{"question": "Where is the capital of France?"}' \
         "<your_endpoint>"
    curl -X POST \
         -H "Authorization: Bearer <your_token>" \
         -H "Content-Type: application/json" \
         -d '{"question": "Where is the capital of France?"}' \
         "<your_endpoint>"

    Sample response

    event: Message
    data: {"answer": ""}
    
    event: Message
    data: {"answer": "The"}
    
    event: Message
    data: {"answer": " capital"}
    
    event: Message
    data: {"answer": " of"}
    
    event: Message
    data: {"answer": " France"}
    
    event: Message
    data: {"answer": " is"}
    
    event: Message
    data: {"answer": " Paris"}
    
    event: Message
    data: {"answer": "."}
    
    event: Message
    data: {"answer": ""}
    {"answer":"The capital of France is Paris."}

    The following table describes the request parameters.

    Parameter

    Description

    -H "Authorization: Bearer <your_token>"

    The request header. Replace <your_token> with the token that you obtained in Step 1.

    -H "Accept: text/event-stream"

    Indicates that the client accepts Server-Sent Events (SSE) requests and the response is a stream. Note: Streaming calls are supported only when an LLM node is the output node of the application flow. The direct input to the end node must be an LLM node.

    -d '{"question": "Where is the capital of France?"}'

    The request body. It is a JSON object that contains a key-value pair, which is the question string. The key must match the "Conversation Input" parameter in the Start node of the application flow. This topic uses the default field question.

    "<your_endpoint>"

    The destination URL of the request. Replace <your_endpoint> with the endpoint that you obtained in Step 1.

    Python script

    The following example shows how to use the requests library to send a POST request to the application flow service. The request can be streaming or non-streaming. Ensure that the library is installed. If it is not, run the pip install requests command to install it.

    Example type

    Streaming

    Non-streaming

    Sample request

    import requests
    import json
    
    url = "http://<your-endpoint-here>"
    token = "<your-token-here>"
    data = {"question": "Where is the capital of France?"}
    
    # Set the request header, including your token.
    headers = {
        "Authorization": f"Bearer {token}",
        "Accept": "text/event-stream",
        "Content-Type": "application/json"
    }
    
    if __name__ == '__main__':
        with requests.post(url, json=data, headers=headers, stream=True) as r:
            for line in r.iter_lines(chunk_size=1024):
                print(line)
    
    import requests
    import json
    
    url = "http://<your-endpoint-here>"
    token = "<your-token-here>"
    data = {"question": "Where is the capital of France?"}
    
    # Set the request header, including your token.
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=data, headers=headers)
    
    if response.status_code == 200:
        print("Request successful. Response:")
        print(response.text)
    else:
        print(f"Request failed. Status code: {response.status_code}")
    

    Sample response

    event: Message
    data: {"answer": ""}
    
    event: Message
    data: {"answer": "The"}
    
    event: Message
    data: {"answer": " capital"}
    
    event: Message
    data: {"answer": " of"}
    
    event: Message
    data: {"answer": " France"}
    
    event: Message
    data: {"answer": " is"}
    
    event: Message
    data: {"answer": " Paris"}
    
    event: Message
    data: {"answer": "."}
    
    event: Message
    data: {"answer": ""}
    {"answer":"The capital of France is Paris."}

    The following table describes the request parameters.

    Parameter

    Description

    url

    The destination URL of the request. Replace <your-endpoint-here> with the endpoint that you obtained in Step 1.

    token

    The request header. Replace <your-token-here> with the token that you obtained in Step 1.

    data

    The request body. It is a JSON object that contains a key-value pair, which is the question string. The key must match the "Conversation Input" parameter in the Start node of the application flow. This topic uses the default field question.

    "Accept": "text/event-stream"

    Indicates that the client accepts SSE requests and the response is a stream. Note: Streaming calls are supported only when an LLM node is the output node of the application flow. The direct input to the end node must be an LLM node.

    Complete mode

    LangStudio supports Server-Sent Events (SSE). When you send a request, the service can output the status of each node, error messages, and output messages during the execution of the application flow. You can also customize the content of node_run_infos in events. This section uses online debugging as an example. You must append /run to the endpoint and then edit the request body:

    image

    The following table describes the fields in the request body.

    Field

    Type

    Default value

    Description

    inputs

    Mapping[str, Any]

    None

    The input data dictionary for the flow. The keys must match the input field names defined in the flow. If the flow has no inputs, ignore this field.

    stream

    bool

    True

    Controls the response format. Valid values:

    • True: Responds with an SSE stream. The Content-Type in the response header is text/event-stream. The data is returned in DataOnly format and is divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated. For more information, see the following sections.

    • False: Responds with a single JSON object. The Content-Type in the response header is application/json. For more information, see the response in Online debugging.

    response_config

    Dict[str, Any]

    -

    Controls the node details included in the streaming response when stream is set to True.

    ∟ include_node_description

    bool

    False

    (In response_config) Specifies whether to include node descriptions in the SSE event stream.

    ∟ include_node_display_name

    bool

    False

    (In response_config) Specifies whether to include node display names in the SSE event stream.

    ∟ include_node_output

    bool

    False

    (In response_config) Specifies whether to include node outputs in the SSE event stream.

    ∟ exclude_nodes

    List[str]

    []

    (In response_config) A list of node names to exclude from the SSE event stream.

    The returned data is divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated.

    RunStarted event

    • Definition: The RunStarted event marks the start of a flow execution. It is typically the first event sent in the SSE stream for a run.

    • Payload example:

      data: {"event": "RunStarted", "run_id": "fb745e15-3b3b-4a10-9e0d-0bea08d47411", "timestamp": "2025-06-12T08:15:07.223611Z", "flow_run_info": {"run_id": "fb745e15-3b3b-4a10-9e0d-0bea08d47411", "status": "Running", "error": null, "otel_trace_id": ""}}
    • Field descriptions:

      Field

      Type

      Description

      event

      string

      The event type. The value is fixed as RunStarted.

      run_id

      string

      The unique identifier for the current flow execution.

      timestamp

      string

      The timestamp of the event, in ISO 8601 format.

      flow_run_info

      object

      Contains the final status information for the entire flow run.

      ∟ run_id

      string

      (In flow_run_info) The unique identifier for the flow execution. This is the same as the outer run_id.

      ∟ status

      string

      (In flow_run_info) The initial status of the flow execution. The value is fixed as Running.

      ∟ error

      object or null

      (In flow_run_info) If the flow execution fails, this field contains an error message object. Otherwise, the value is null.

      ∟ otel_trace_id

      string

      (In flow_run_info) The OpenTelemetry trace ID associated with this flow execution. The value can be empty or a zero value.

    NodeUpdated event

    • Definition: The NodeUpdated event indicates that the status or output of one or more nodes in the flow has changed. During a flow execution, this event is typically sent when a node starts running (Running) or completes execution (Completed or Failed). If response_config is set, this event can also include the node description, display name, and output. Note: If you specify exclude_nodes in the response_config of the request, NodeUpdated events for the specified nodes are not returned.

    • Payload example:

      data: {"event": "NodeUpdated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.208601Z", "node_run_infos": [{"node_name": "custom_python", "node": "custom_python", "status": "Running", "error": null, "duration": 0.0, "description": null, "display_name": null, "output": null}]}
      data: {"event": "NodeUpdated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.209621Z", "node_run_infos": [{"node_name": "custom_python", "node": "custom_python", "status": "Completed", "error": null, "duration": 0.001246, "description": null, "display_name": null, "output": {"text": "echo:hello", "input_length": 2}}]}
    • Field descriptions:

      Field

      Type

      Description

      event

      string

      The event type. The value is fixed as NodeUpdated.

      run_id

      string

      The unique identifier for the current flow execution.

      timestamp

      string

      The timestamp of the event, in ISO 8601 format.

      node_run_infos

      array[object]

      An array that contains one or more node run information objects. Each object represents a node whose status or output has changed.

      ∟ node_name

      string

      (In node_run_infos) The name of the node. This is a legacy field and is the same as the node field.

      ∟ node

      string

      (In node_run_infos) The name of the node.

      ∟ status

      string

      (In node_run_infos) The current status of the node, such as Running, Completed, or Failed.

      ∟ error

      object or null

      (In node_run_infos) If the node execution fails, this field contains an error message object. Otherwise, the value is null.

      ∟ duration

      float

      (In node_run_infos) The time spent on node execution, in seconds. For the Running status, the value is typically 0.0.

      ∟ description

      string or null

      (In node_run_infos) The description of the node. This field is included only when response_config.include_node_description is set to true in the request. Otherwise, the value is null.

      ∟ display_name

      string or null

      (In node_run_infos) The display name of the node. This field is included only when response_config.include_node_display_name is set to true in the request. Otherwise, the value is null.

      ∟ output

      object or null

      (In node_run_infos) The output data of the node. This field is included only when the node status is Completed and response_config.include_node_output is set to true in the request. Otherwise, the value is null.

    RunOutput event

    • Definition: The RunOutput event indicates that the flow execution has generated its final output. It typically occurs before the RunTerminated event at the end of the flow run.

    • Payload example:

      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.745130Z", "outputs": {"answer": "What can"}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.829133Z", "outputs": {"answer": " I help you with?"}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.950055Z", "outputs": {"answer": ""}, "output_metadata": {"answer": {"is_stream": true, "status": "Streaming"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_llm_3ce063ad-bc9b-417d-9e68-08ce92c3db1b", "timestamp": "2025-04-30T11:55:24.954983Z", "outputs": {}, "output_metadata": {"answer": {"is_stream": true, "status": "Finished"}}}
      
      data: {"event": "RunOutput", "run_id": "4c185c72-1bb0-4beb-a288-f7a73e37fc3b_python_oHG7_1c9fb0ac-0f45-4dbc-bf97-8e4175fd991c", "timestamp": "2025-04-30T11:55:24.957091Z", "outputs": {"python_output": "Hello: Hello! What can I help you with?"}, "output_metadata": {"python_output": {"is_stream": false, "status": "Finished"}}}
    • Field descriptions:

      Field

      Type

      Description

      event

      string

      The event type. The value is fixed as RunOutput.

      run_id

      string

      The unique identifier for the current flow execution.

      timestamp

      string

      The timestamp of the event, in ISO 8601 format.

      outputs

      object

      A dictionary that contains the final output of the flow. The structure depends on the outputs defined in the flow design.

      output_metadata

      object

      A dictionary that contains the output metadata of the flow. The keys are the output names, which correspond to the keys in outputs. The values are objects that contain the metadata for the output, such as is_stream and status.

    RunTerminated event

    • Definition: The RunTerminated event marks the end of a flow execution. It is typically the last event sent in the SSE stream for a run.

    • Payload example:

      data: {"event": "RunTerminated", "run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "timestamp": "2025-04-25T08:57:15.212791Z", "flow_run_info": {"run_id": "8f92b1a6-4d69-422a-a080-50713e488b56", "status": "Completed", "error": null, "otel_trace_id": "0x00000000000000000000000000000000"}}
    • Field descriptions:

      Field name

      Type

      Description

      event

      string

      The event type is set to RunTerminate.

      run_id

      string

      The unique identifier for the current flow execution.

      timestamp

      string

      The timestamp of the event, in ISO 8601 format.

      flow_run_info

      object

      Contains the final status information for the entire flow run.

      ∟ run_id

      string

      (In flow_run_info) The unique identifier for the flow execution. This is the same as the outer run_id.

      ∟ status

      string

      (In flow_run_info) The final status of the flow execution, such as Completed, Failed, or Canceled.

      ∟ error

      object or null

      (In flow_run_info) If the flow execution fails, this field contains an error message object. Otherwise, the value is null.

      ∟ otel_trace_id

      string

      (In flow_run_info) The OpenTelemetry trace ID associated with this flow execution. The value can be empty or a zero value.

OpenAI compatible calls

A deployed chat application flow, or ChatFlow, supports OpenAI compatible calls. The service can also be used by other clients that support OpenAI.

Using the OpenAI API

This section provides an example of a streaming call using a cURL command. The following are request and response examples:

Sample request:

curl --location '<Endpoint>/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "default",  
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true
}'

The following table describes the request parameters.

Parameter

Description

--location '<Endpoint>/v1/chat/completions'

The destination URL of the request. Replace <Endpoint> with the endpoint that you obtained in Step 1 of API calls.

--header "Authorization: Bearer $DASHSCOPE_API_KEY"

The request header. Replace $DASHSCOPE_API_KEY with the token that you obtained in Step 1 of API calls.

"model": "default"

The model name. The value is fixed as default.

"stream":true

Specifies whether the response is a stream. Note: Streaming calls are supported only when an LLM node is the output node of the application flow. The direct input to the end node must be an LLM node.

Sample response:

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":"created by Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":". I am called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}

data: [DONE]

Integration with other client applications

This section uses the ChatBox v1.13.4 application on the Windows platform as an example. For other client applications, such as Cherry Studio and AnythingLLM, see Integrate RAG services with local applications.

  1. Download and install Chatbox.

  2. Open ChatBox and configure the Model Provider Name, such as LangStudio.

    image

  3. Select the configured model provider and configure the service request parameters.

    image

    The following table describes the key parameters.

    Parameter

    Description

    API Mode

    The value is fixed as OpenAI API Compatible.

    API Key

    The token of the deployed LangStudio service. For more information about how to obtain the token, see Step 1 of API calls.

    API Host

    The endpoint of the deployed LangStudio service. For more information about how to obtain the endpoint, see Step 1 of API calls. Add the /v1 suffix to the end of the endpoint. This topic uses an Internet endpoint as an example. The API host is set to http://langstudio-20250319153409-xdcp.115770327099****.cn-hangzhou.pai-eas.aliyuncs.com/v1.

    API Path

    The value is fixed as /chat/completions.

    Model

    Click New and enter a custom model ID, such as qwen3-8b.

  4. Call the deployed LangStudio service in the chat dialog box.

    image

View traces

After you call the service, a trace is automatically generated. On the Tracing Analysis tab, find the trace that you want to view and click View Trace in the Actions column to evaluate the performance of the application flow.

image

The trace data lets you view the input and output of each node in the application flow, such as the retrieval results from the vector database or the input and output of the LLM node.

Appendix: Chat history

For chat-based application flows, LangStudio provides a feature to store the history of multi-turn conversations. You can use local storage or external storage to save the chat history.

Storage types

  • Local storage: The service uses the local disk to automatically create a SQLite database named chat_history.db on the EAS instance where the application flow is deployed. This database saves the chat history. The default storage path is /langstudio/flow/. Note: Local storage does not support multi-instance deployment. You should regularly check the local disk usage. You can also use the provided API operations to query and delete chat history data. If the EAS instance is removed, the related chat history is also deleted.

  • External storage: ApsaraDB RDS for MySQL is supported. When you deploy the service, you must configure a connection to an ApsaraDB RDS for MySQL instance to store the chat history. For more information about the configuration, see Database connection configuration. The service automatically creates tables with the service name as a suffix in the configured ApsaraDB RDS for MySQL database. For example, the langstudio_chat_session_<service_name> table stores chat sessions, and the langstudio_chat_history_<service_name> table stores chat history messages.

Session and user support

Each chat request to the application flow service is stateless. If you want multiple requests to be treated as the same conversation, you must manually configure the request header. For more information about how to make API calls, see API calls.

Request header

Data type

Description

Notes

Chat-Session-Id

String

The session ID. For each service request, the system automatically assigns a unique identifier to the session to distinguish it from other sessions. The ID is returned in the Chat-Session-Id field of the response header.

You can use a custom session ID. To ensure uniqueness, the session ID must be 32 to 255 characters in length and can contain uppercase letters, lowercase letters, digits, underscores (_), hyphens (-), and colons (:).

Chat-User-Id

String

The user ID. It identifies the user to whom the chat belongs. The system does not automatically assign a user ID. You can use a custom user ID.

-

Chat history API

The application flow service also provides API operations for managing chat history data. You can use these API operations to view and delete the data. To obtain the complete API schema, send a GET request to {Endpoint}/openapi.json. The schema is based on the Swagger standard. To better understand and explore these API operations, you can use Swagger UI for visualization. This simplifies the operations and improves clarity.