AI speaking tutor

更新时间:
复制 MD 格式

Build an AI-powered speaking practice service with Real-time Conversational AI to help learners improve oral skills through personalized, on-demand sessions.

Overview

AI-powered speaking practice eliminates the need for a human partner and removes time and location constraints. The AI analyzes learner history to create personalized content, provides instant feedback and corrections, and simulates diverse scenarios to broaden language application. A stress-free environment helps learners build confidence and improve speaking skills.

Solution options

Practice formats

Real-time Conversational AI supports two call formats. Specify a call type when you create an agent, then integrate it. Try the demo to preview the experience. To integrate, follow the Quick Start for audio and video calls.

Call type

Audio-only call

Digital human call

Example

555d2e763e3c49c23ac59cb7060d2a44

lQDPJxjZw5Ame9nNC6zNBaCw89zk0Od4uB8HWJitduNrAA_1440_2988

Practice format

  • Learner: Audio

  • AI partner: Audio

  • Learner: Audio

  • AI partner: Video

Cost

Low

Medium

Client SDKs

For more information about SDK integration, see Developer guide.

SDK

Description

Web SDK

Recommended

  • Desktop browsers, such as Chrome.

  • Mobile H5, such as Alipay H5, DingTalk H5, and WeChat mini program H5.

  • In-app WebView.

Note
  • Do not use native mobile browsers because some devices are not compatible with Web Real-Time Communication (WebRTC).

  • Native components of WeChat mini programs are not supported. Use WeChat mini program H5. For more information about integration, see Integrate the Web SDK in a WeChat mini program.

Android/iOS SDK

Recommended: Applications that run on the Android or iOS operating system.

Others

If you want to develop on a Windows or macOS desktop, search for the DingTalk group ID 106730016696 to join the group and contact us.

Core features

Personalized calls and scenario switching

Customize each call by configuring call startup and personalization parameters. Learners can switch scenarios mid-call without disconnecting. For example, switching from a "directions" scenario to a "shopping" scenario requires redefining the Large Language Model (LLM) prompt.

Setting

Description

Can be modified during a call

LLM prompt

Include learner information in the prompt. Pass it as an input parameter when starting the call for more targeted practice.

Yes

Automatic Speech Recognition (ASR) language

Set the language, such as Chinese or English.

Yes

Text-to-Speech (TTS) voice

Set the AI's voice.

Yes

Digital human avatar

If your agent is a VideoAgent and you have multiple digital human avatars, specify an avatar for the call.

No

Welcome message

Set a welcome message for different users, such as "Hello, Xiaoyun. Today, we will simulate a shopping scenario..."

No

Knowledge base

To set up a knowledge base:

  1. Use Alibaba Cloud Model Studio to create an agent and publish it to Real-time Conversational AI. For more information about publishing an agent, see Publish a Real-time Conversational AI agent from Alibaba Cloud Model Studio.

  2. Set up the question library in Alibaba Cloud Model Studio. For more information about how to set up a question library, see Quick Start.

Send custom messages to users

During a call, if you want to send information such as cards or questions to the client in real time, Real-time Conversational AI provides a dedicated channel for sending messages. After the client receives your custom message, it can perform custom business actions, such as downloading resources and interactive rendering.

image

Alibaba Cloud provides two solutions:

  • Solution 1: You can send custom messages to the client from your AppServer. For more information, see Send custom messages to a client.

  • Solution 2: You can also include custom messages in the response from the large language model (LLM). The message is delivered to the client in real time with the captions.

    Note

    You can hide instructions in the model response and mark them with special symbols, such as `{}` or `[]`. To do this, go to Console > Workflow > TTS Node > Filter Broadcast. The marked content is not spoken. You can then parse this content to handle your custom business logic.

User information pass-through

When multiple users are online during a call, the LLM must distinguish which user sent the current input. Pass custom information such as a UserID to the model. Pass business parameters to an Alibaba Cloud Model Studio LLM.

Detect and handle user silence

You can listen for the intent_recognized parameter in the callback to obtain the time of each user utterance. For more information, see Agent callbacks. This lets you handle cases where a user is silent for a long time. Common handling methods are as follows:

Transcription and recording

Save audio or text data from practice sessions. Data archiving.

Advanced features

Sentence-by-sentence evaluation

Real-time Conversational AI records each user utterance as a separate audio file in real time and stores it in your specified Object Storage Service (OSS) bucket. You can then run pronunciation evaluation on these files.

Note

Real-time Conversational AI provides sentence-by-sentence recording only, not audio evaluation. To set up sentence-by-sentence audio callbacks, use Agent callbacks.