Application flow development-Platform For AI(PAI)-阿里云帮助中心

Quick start

Creation methods

Create from template: Use application templates for various scenarios to quickly build AI applications.
Create by type:
- Standard: Suitable for general-purpose application development. Use Large Language Models (LLMs), custom Python code, and other tools to build your application flow.
- Conversational: Suitable for conversational application development. This type builds on the Standard type and adds features for managing conversation history, inputs, and outputs, as well as a dialog-style testing interface.
Import from OSS: Select the ZIP package or the OSS path of the application flow to import. This path must directly contain the application flow's flow.dag.yaml file and other code files.
- You can export an application flow by using the Export feature in the Actions column of the application flow list in LangStudio, and then share it with others to import.
- After you convert a Dify DSL file to the LangStudio application flow format, you can import it by using this method. For more information, see Dify-to-LangStudio migration practice guide.

Configure environment variables

In LangStudio, you can add environment variables that are required at runtime for an application flow. The system automatically loads these variables before execution, making them available for Python nodes, tool calls, or custom logic.

Use cases

Sensitive information management: Store API keys, authentication tokens, and other secrets to avoid hard-coding them in your code.
Configuration parameterization: Flexibly set runtime parameters such as model endpoints and timeouts.

Configuration and usage

In the application flow editor, click Settings in the upper-right corner to add environment variables.
In a Python node, you can access configured environment variables by using standard Python's os.environ:
```
import os

# Example: Get an API key
api_key = os.environ["OPENAI_API_KEY"]
```

Configure speech interaction

In the application flow editor, click Settings in the upper-right corner and configure speech interaction on the Global Settings tab.

Speech-to-Text (STT)

The Speech-to-Text (STT) feature converts a user's voice input into text and populates the " Chat Input " field in the Start node.

Parameter	Description
Model settings	Select a configured model service connection and an ASR model. Currently, models in the Paraformer series are supported.
Recognition language	Set the language for speech recognition. Currently, only the paraformer-v2 model supports specifying the recognition language.

Text-to-speech (TTS)

The Text-to-speech (TTS) feature automatically converts the conversational output of the application flow to speech.

Parameter	Description
Model settings	Select a configured model service connection and a TTS model. Currently, models in the CosyVoice series are supported.
Voice settings	Select the voice for the synthesized speech. Multiple preset voices are supported.
Autoplay	If enabled, the synthesized speech plays automatically during a conversation.

Deployment and API calls

After you deploy the application flow to PAI-EAS, you can use API calls to enable speech interaction. For information about general API calls, see Deploy an application flow. This section details the API changes for speech interaction.

Speech input

In the request body, add the system.audio_input field and provide the audio file URL (for the file data structure, see File Type Input and Output), and the system will automatically convert the audio to text and populate the dialogue input field.

{
  "question": "",
  "system": {
    "audio_input": {
      "source_uri": "oss://your-bucket.oss-cn-hangzhou.aliyuncs.com/audio/input.wav"
    }
  }
}

Speech output

To obtain the TTS-synthesized audio data, call the <Endpoint>/run endpoint. The simple mode does not return audio data.

Field	Description
audio_data	A Base64-encoded audio data fragment. The client must decode and concatenate the fragments for playback.
tts_metadata	Audio metadata, including format (pcm), sample rate (22050 Hz), number of channels (1), and bit depth (16-bit).

Streaming response

TTS audio is returned via the TTSOutput event in the SSE event stream:

{
  "event": "TTSOutput",
  "audio_data": "<base64-encoded audio data>",
  "tts_metadata": {
    "format": "pcm",
    "sample_rate": 22050,
    "channels": 1,
    "bit_depth": 16
  }
}

Non-streaming response

The TTS audio is included in the JSON response as the output.tts_audio field:

{
  "output": {
    "answer": "xxx",
    "tts_audio": {
      "audio_data": "<base64-encoded full audio data>",
      "tts_metadata": {
        "format": "pcm",
        "sample_rate": 22050,
        "channels": 1,
        "bit_depth": 16
      }
    }
  }
}

Pre-built components

For more information, see Application flow node reference.

Next steps

After you develop and debug the application flow, you can evaluate the application flow. Once it meets your business requirements, you can deploy the application flow to PAI-EAS for production use.