After you develop an application flow, you can deploy it as an Elastic Algorithm Service (EAS) service. EAS offers features such as automatic scaling and comprehensive Service Monitoring. These features help your application adapt to business changes, improve system stability and performance, and meet the demands of a production environment.
Prerequisites
You have created and debugged an application flow. For more information, see Develop an application flow.
Deploy an application flow
Go to LangStudio and select a workspace. On the Application Flow tab, click your debugged application flow, and then click Deploy in the upper-right corner. You can deploy the application flow only if the runtime is started. The following table describes the key parameters.

Parameter | Description |
Resource deployment | |
Resource type | Select a public resource group or a dedicated resource group that you created. |
Instances | Configure the number of service instances. In a production environment, configure multiple service instances to reduce the risk of a single point of failure. |
Deployment resources | If you use the application flow only for business flow scheduling, select appropriate CPU resources based on the complexity of the business flow. Compared with GPU resources, CPU resources are usually more cost-effective. After deployment, you are charged a resource fee for EAS. For more information about billing, see Billing of Elastic Algorithm Service (EAS). |
Virtual Private Cloud (VPC): An application flow is deployed as an EAS service. To ensure that clients can access the online EAS service after deployment, select a VPC to connect the client to the service. Note that EAS services cannot access the public network by default. If your EAS service needs to access the public network, configure a VPC with public network access capabilities. For more information, see Access public or private resources from EAS. Note If the application flow includes a vector database connection, such as Milvus, ensure that the configured VPC is the same as the VPC where the vector database instance resides, or ensure that the two networks are connected. | |
Chat history | |
Enable chat history | This parameter applies only to chat-based application flows. When enabled, the service can store and transmit the history of multi-turn conversations. This feature must be used with the service request header parameter. |
Chat history storage | Local storage does not support multi-instance deployment. If you deploy the service for production use, use external storage, such as ApsaraDB RDS. For more information, see Appendix: Chat history. Important If you use local storage, multi-instance deployment is not supported. Scaling out from a single instance to multiple instances is also not supported. Otherwise, the chat history feature may not work correctly. |
Tracing Analysis: When enabled, you can view trace details after the service is deployed to evaluate the performance of the application flow. | |
Roles and permissions: In the application flow, if you use a Faiss vector database (you must select a Faiss or Milvus vector database for knowledge base management) or the "Alibaba Cloud IQS-Standard Search" component (used by the IQS Web Search Chat Assistant template), select a role as needed. | |
For more information about parameter settings, see Custom deployment.
Online debugging
Call the service
Online debugging
After the service is successfully deployed, you are redirected to the PAI-EAS console. On the Online Debugging tab, you can configure and send a request. The key in the request body must be the same as the value of the "Conversation Input" parameter in the Start node of the application flow. This topic uses the default field question.

API calls
On the Overview tab, obtain the service endpoint and token.

Send an API request.
You can call the service in basic mode or complete mode. The following table describes the differences.
Property
Basic mode
Complete mode
Request path
<Endpoint>/<Endpoint>/runDescription
Directly returns the output of the application flow.
Returns a complex structure that includes the node status, error messages, and output messages of the application flow.
Scenarios
You need only the final output of the application flow and do not care about the internal processing or status.
Suitable for simple queries or operations to quickly obtain results.
You need to understand the execution process of the application flow in detail, including the status of each node and possible error messages.
Suitable for debugging, monitoring, or analyzing the execution of the application flow.
Advantages
Simple to use. You do not need to parse complex structures.
Provides comprehensive information to help you understand the execution process of the application flow in depth.
Helps troubleshoot and optimize the performance of the application flow.
Basic mode
Complete mode
LangStudio supports Server-Sent Events (SSE). When you send a request, the service can output the status of each node, error messages, and output messages during the execution of the application flow. You can also customize the content of
node_run_infosin events. This section uses online debugging as an example. You must append/runto the endpoint and then edit the request body:
The following table describes the fields in the request body.
Field
Type
Default value
Description
inputs
Mapping[str, Any]
None
The input data dictionary for the flow. The keys must match the input field names defined in the flow. If the flow has no inputs, ignore this field.
stream
bool
True
Controls the response format. Valid values:
True: Responds with an SSE stream. The Content-Type in the response header is
text/event-stream. The data is returned in DataOnly format and is divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated. For more information, see the following sections.False: Responds with a single JSON object. The Content-Type in the response header is
application/json. For more information, see the response in Online debugging.
response_config
Dict[str, Any]
-
Controls the node details included in the streaming response when stream is set to True.
∟ include_node_description
bool
False
(In response_config) Specifies whether to include node descriptions in the SSE event stream.
∟ include_node_display_name
bool
False
(In response_config) Specifies whether to include node display names in the SSE event stream.
∟ include_node_output
bool
False
(In response_config) Specifies whether to include node outputs in the SSE event stream.
∟ exclude_nodes
List[str]
[]
(In response_config) A list of node names to exclude from the SSE event stream.
The returned data is divided into different events: RunStarted, NodeUpdated, RunOutput, and RunTerminated.
OpenAI compatible calls
A deployed chat application flow, or ChatFlow, supports OpenAI compatible calls. The service can also be used by other clients that support OpenAI.
Using the OpenAI API
This section provides an example of a streaming call using a cURL command. The following are request and response examples:
Sample request:
curl --location '<Endpoint>/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"model": "default",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true
}'The following table describes the request parameters.
Parameter | Description |
--location '<Endpoint>/v1/chat/completions' | The destination URL of the request. Replace |
--header "Authorization: Bearer $DASHSCOPE_API_KEY" | The request header. Replace |
"model": "default" | The model name. The value is fixed as |
"stream":true | Specifies whether the response is a stream. Note: Streaming calls are supported only when an LLM node is the output node of the application flow. The direct input to the end node must be an LLM node. |
Sample response:
data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"finish_reason":null,"delta":{"content":"I am"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":"a large"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":"language model"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":"created by Alibaba Cloud"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":". I am called Qwen."},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: {"choices":[{"delta":{"content":""},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1715931028,"system_fingerprint":null,"model":"qwen-plus","id":"chatcmpl-3bb05cf5cd819fbca5f0b8d67a025022"}
data: [DONE]Integration with other client applications
This section uses the ChatBox v1.13.4 application on the Windows platform as an example. For other client applications, such as Cherry Studio and AnythingLLM, see Integrate RAG services with local applications.
Download and install Chatbox.
Open ChatBox and configure the Model Provider Name, such as LangStudio.

Select the configured model provider and configure the service request parameters.

The following table describes the key parameters.
Parameter
Description
API Mode
The value is fixed as
OpenAI API Compatible.API Key
The token of the deployed LangStudio service. For more information about how to obtain the token, see Step 1 of API calls.
API Host
The endpoint of the deployed LangStudio service. For more information about how to obtain the endpoint, see Step 1 of API calls. Add the
/v1suffix to the end of the endpoint. This topic uses an Internet endpoint as an example. The API host is set tohttp://langstudio-20250319153409-xdcp.115770327099****.cn-hangzhou.pai-eas.aliyuncs.com/v1.API Path
The value is fixed as
/chat/completions.Model
Click New and enter a custom model ID, such as qwen3-8b.
Call the deployed LangStudio service in the chat dialog box.

View traces
After you call the service, a trace is automatically generated. On the Tracing Analysis tab, find the trace that you want to view and click View Trace in the Actions column to evaluate the performance of the application flow.

The trace data lets you view the input and output of each node in the application flow, such as the retrieval results from the vector database or the input and output of the LLM node.
Appendix: Chat history
For chat-based application flows, LangStudio provides a feature to store the history of multi-turn conversations. You can use local storage or external storage to save the chat history.
Storage types
Local storage: The service uses the local disk to automatically create a SQLite database named chat_history.db on the EAS instance where the application flow is deployed. This database saves the chat history. The default storage path is
/langstudio/flow/. Note: Local storage does not support multi-instance deployment. You should regularly check the local disk usage. You can also use the provided API operations to query and delete chat history data. If the EAS instance is removed, the related chat history is also deleted.External storage: ApsaraDB RDS for MySQL is supported. When you deploy the service, you must configure a connection to an ApsaraDB RDS for MySQL instance to store the chat history. For more information about the configuration, see Database connection configuration. The service automatically creates tables with the service name as a suffix in the configured ApsaraDB RDS for MySQL database. For example, the
langstudio_chat_session_<service_name>table stores chat sessions, and thelangstudio_chat_history_<service_name>table stores chat history messages.
Session and user support
Each chat request to the application flow service is stateless. If you want multiple requests to be treated as the same conversation, you must manually configure the request header. For more information about how to make API calls, see API calls.
Request header | Data type | Description | Notes |
Chat-Session-Id | String | The session ID. For each service request, the system automatically assigns a unique identifier to the session to distinguish it from other sessions. The ID is returned in the | You can use a custom session ID. To ensure uniqueness, the session ID must be 32 to 255 characters in length and can contain uppercase letters, lowercase letters, digits, underscores (_), hyphens (-), and colons (:). |
Chat-User-Id | String | The user ID. It identifies the user to whom the chat belongs. The system does not automatically assign a user ID. You can use a custom user ID. | - |
Chat history API
The application flow service also provides API operations for managing chat history data. You can use these API operations to view and delete the data. To obtain the complete API schema, send a GET request to {Endpoint}/openapi.json. The schema is based on the Swagger standard. To better understand and explore these API operations, you can use Swagger UI for visualization. This simplifies the operations and improves clarity.

