WebSocket API

更新时间:
复制 MD 格式

Access the Paraformer real-time speech recognition service over a WebSocket connection. This topic describes the service endpoints, request headers, and interaction flow.

User guide: For model overviews and selection guidance, see Speech-to-text. For sample code, see Real-time speech recognition.

The DashScope SDK currently supports only Java and Python. For other languages, connect to the service directly over WebSocket.

Service endpoints

Paraformer is available only in the China (Beijing) region. The WebSocket URL is fixed:

wss://dashscope.aliyuncs.com/api-ws/v1/inference

Important

The URL must use the wss:// protocol. Provide your API key in the Authorization request header (see Request headers).

Request headers

Add the following fields to the request header:

Parameter

Type

Required

Description

Authorization

string

Yes

Authentication token in the format Bearer <your_api_key>. Replace <your_api_key> with your API key.

user-agent

string

No

Client identifier. Helps the server identify the source of incoming requests.

X-DashScope-WorkSpace

string

No

Alibaba Cloud Model Studio workspace ID.

X-DashScope-DataInspection

string

No

Whether to enable data inspection. Omit this header by default; set it to enable only when required.

Important

The Authorization header is verified during the WebSocket handshake. If the API key is invalid or missing, the handshake fails with an HTTP 401 or 403 error.

Interaction flow

For details about client-side and server-side events, see Client events and Server-sent events.

image

The client and server interact in the following sequence:

  1. Establish the connection: The client opens a WebSocket connection to the server.

  2. Start the task: The client sends a run-task instruction. The server replies with a task-started event to confirm that the task has started, after which subsequent steps can proceed.

  3. Send the audio stream: The client streams mono binary audio. The server returns result-generated events containing the recognition results.

  4. Notify the server to end the task: The client sends a finish-task instruction and continues to receive result-generated events from the server.

  5. End the task: The client receives a task-finished event from the server, indicating that the task has ended.

  6. Close the connection: The client closes the WebSocket connection.