This topic describes how to use the Go software development kit (SDK) for the short-sentence speech recognition feature of Intelligent Speech Interaction. It includes installation instructions and code examples.
Prerequisites
Before you use the SDK, you must review the API reference. For more information, see API reference.
Ensure that the Golang environment is installed and its basic configuration is complete.
The SDK supports Go 1.16 and later.
Download and install
Download and install the SDK.
Run the following command to download and install the SDK.
go get github.com/aliyun/alibabacloud-nls-go-sdkImport the SDK.
Add the following import statement to your code.
import ("github.com/aliyun/alibabacloud-nls-go-sdk")
SDK constants
Constant | Description |
SDK_VERSION | The SDK version. |
PCM | The PCM audio format. |
WAV | The WAV audio format. |
OPUS | The OPUS audio format. |
OPU | The OPU audio format. |
DEFAULT_DISTRIBUTE | The default region used to obtain a token. The default value is "cn-shanghai". |
DEFAULT_DOMAIN | The default URL used to obtain a token. The default value is "nls-meta.cn-shanghai.aliyuncs.com". |
DEFAULT_VERSION | The protocol version used to obtain a token. The default value is "2019-02-28". |
DEFAULT_URL | The default public cloud URL. The default value is "wss://nls-gateway-cn-shanghai.aliyuncs.com/ws/v1". |
Establish a connection
If you use an Akid and Akkey to obtain a token, you must cache the token and update it before it expires. Do not frequently call the API operation to obtain a token, because the cloud service may throttle your requests.
1. ConnectionConfig
The basic parameters that are required to establish a connection.
Parameters:
Parameter | Type | Description |
Url | String | The public cloud URL to access. If you are unsure, use the DEFAULT_URL constant. |
Token | String | The access token. For more information, see Overview of token generation. |
Akid | String | The AccessKey ID of your Alibaba Cloud account.
|
Akkey | String | The AccessKey Secret of your Alibaba Cloud account.
|
Appkey | String | The Appkey of the project. To obtain an Appkey, go to the console. |
2. func NewConnectionConfigWithToken(url string, appkey string, token string) *ConnectionConfig
Creates a `ConnectionConfig` object from a URL, an Appkey, and a token.
Parameters:
Parameter
Type
Description
Url
String
The public cloud URL to access. If you are unsure, use the DEFAULT_URL constant.
Appkey
String
The Appkey of the project. To obtain an Appkey, go to the console.
Token
String
The access token. For more information, see Overview of token generation.
Return value:
*ConnectionConfig: A pointer to the connection configuration object.
3. func NewConnectionConfigFromJson(jsonStr string) (*ConnectionConfig, error)
You can create connection parameters from a JSON string.
Parameters:
Parameter
Type
Description
jsonStr
String
A JSON string that describes the connection parameters. Valid fields are url, token, akid, akkey, and appkey. The url and appkey fields are required. If you include the token field, you do not need to include the akid and akkey fields.
Return values:
*ConnectionConfig: A pointer to the connection configuration object.
error: An error.
Short-sentence speech recognition
1. SpeechRecognitionStartParam
The parameters for a short-sentence speech recognition request.
Parameter | Type | Description |
Format | String | The audio format. Default value: PCM. Valid values: OPUS, OPU, and PCM. If you use OPUS or OPU, you must encode the audio yourself. |
SampleRate | Integer | The sample rate. Default value: 16000 Hz. |
EnableIntermediateResult | Boolean | Specifies whether to return intermediate recognition results.
|
EnablePunctuationPrediction | Boolean | Specifies whether to enable punctuation prediction.
|
EnableInverseTextNormalization | Boolean | Specifies whether to enable inverse text normalization (ITN). ITN converts Chinese numerals to Arabic numerals. If you set this parameter to true, Chinese numerals are converted to Arabic numerals in the output. Default value: False. |
2. func DefaultSpeechRecognitionParam() SpeechRecognitionStartParam
Returns a set of default parameters. By default, the audio format is PCM, the sample rate is 16000 Hz, and the features for intermediate results, punctuation prediction, and Inverse Text Normalization (ITN) are enabled.
Parameters: None.
No value is returned.
3. func NewSpeechRecognition(...) (*SpeechRecognition, error)
Creates a SpeechRecognition instance.
Parameters:
Parameter
Type
Description
config
*ConnectionConfig
For more information, see the Establish a connection section.
logger
*NlsLogger
For more information, see the SDK logs section.
taskfailed
func(string, interface{})
The callback parameter for handling errors during the recognition process. interface{} is a user-defined parameter.
started
func(string, interface{})
The callback parameter for when the connection is established.
resultchanged
func(string, interface{})
The callback parameter for intermediate recognition results.
completed
func(string, interface{})
The callback parameter for the final recognition result.
closed
func(interface{})
The callback parameter for when the connection is disconnected.
param
interface{}
A user-defined parameter.
Return values:
*SpeechRecognition: A pointer to the short-sentence speech recognition object.
error: An error.
4. func (sr *SpeechRecognition) Start(param SpeechRecognitionStartParam, extra map[string]interface{}) (chan bool, error)
Initiates a short-sentence speech recognition request with the specified parameters.
Parameters:
Parameter
Type
Description
param
SpeechRecognitionStartParam
The parameters for short-sentence speech recognition.
extra
map[string]interface{}
Extra
key:valueparameters.
Return values:
chan bool: A channel that is used to synchronize the start of the speech recognition process. You can send audio data only after the channel is ready.
error: An error.
5. func (sr *SpeechRecognition) Stop() (chan bool, error)
Stops the short-sentence speech recognition process.
Parameters: None.
Return values:
chan bool: A channel that is used to synchronize the end of the speech recognition process.
error: An error.
6. func (sr *SpeechRecognition) Shutdown()
Forcibly disconnects the connection.
Parameters: None.
Return value: None.
7. func (sr *SpeechRecognition) SendAudioData(data []byte) error
Sends audio data. The audio format must match the format that is specified in the parameters.
Parameters:
Parameter
Type
Description
data
[]byte
The audio data.
Return value:
error: An error.
SDK logs
1. func DefaultNlsLog() *NlsLogger
Creates a global default log object. By default, the log has the prefix "NLS" and writes to standard error.
Parameters: None.
Return value:
*NlsLogger: A pointer to the log object.
2. func NewNlsLogger(w io.Writer, tag string, flag int) *NlsLogger
Creates a new log object.
Parameters:
Parameter
Type
Description
w
io.Writer
Any object that implements the io.Writer interface.
tag
String
The log prefix. It is printed at the beginning of each log line.
flag
Integer
The log flag. For more information, see the official Go log documentation.
Return value:
*NlsLogger: A pointer to the log object.
3. func (logger *NlsLogger) SetLogSil(sil bool)
Specifies whether to write the log to the corresponding io.Writer.
Parameters:
Parameter
Type
Description
sil
Boolean
Specifies whether to disable log output.
true: Prohibited.
false: Allowed.
Return value: None.
4. func (logger *NlsLogger) SetDebug(debug bool)
Specifies whether to enable debug logging. This setting affects only logs that are created using Debugf or Debugln.
Parameters:
Parameter
Type
Description
debug
Boolean
Specifies whether to allow debug log output.
true: Allows output.
false: Disabled.
Return value: None.
5. func (logger *NlsLogger) SetOutput(w io.Writer)
Sets the output destination for the log.
Parameters:
Parameter
Type
Description
w
io.Writer
Any object that implements the io.Writer interface.
Return value: None.
6. func (logger *NlsLogger) SetPrefix(prefix string)
Sets the prefix for each log entry.
Parameters:
Parameter
Type
Description
prefix
String
The label for log lines. It is output at the beginning of each log line.
Return value: None.
7. func (logger *NlsLogger) SetFlags(flags int)
Sets the log properties.
Parameters:
Parameter
Type
Description
flags
Integer
The log properties. For more information, see the official Go documentation.
Return value: None.
8. Log printing
Logging method
Method name | Description |
func (l *NlsLogger) Print(v ...interface{}) | Standard log output. |
func (l *NlsLogger) Println(v ...interface{}) | Standard log output that adds a new line. |
func (l *NlsLogger) Printf(format string, v ...interface{}) | Formatted log output. For more information about the format, see the official Go documentation. |
func (l *NlsLogger) Debugln(v ...interface{}) | Debug log output that adds a new line. |
func (l *NlsLogger) Debugf(format string, v ...interface{}) | Formatted debug log output. |
func (l *NlsLogger) Fatal(v ...interface{}) | Outputs a fatal error log and then exits the process. |
func (l *NlsLogger) Fatalln(v ...interface{}) | Outputs a fatal error log, adds a new line, and then exits the process. |
func (l *NlsLogger) Fatalf(format string, v ...interface{}) | Outputs a formatted fatal error log and then exits the process. |
func (l *NlsLogger) Panic(v ...interface{}) | Outputs a fatal error log, prints crash information, and then exits the process. |
func (l *NlsLogger) Panicln(v ...interface{}) | Outputs a fatal error log, adds a new line, prints crash information, and then exits the process. |
func (l *NlsLogger) Panicf(format string, v ...interface{}) | Outputs a formatted fatal error log, prints crash information, and then exits the process. |
Code example
package main
import (
"errors"
"flag"
"fmt"
"log"
"os"
"os/signal"
"sync"
"time"
"github.com/aliyun/alibabacloud-nls-go-sdk"
)
const (
AKID = "Your AKID"
AKKEY = "Your AKKEY"
//online key
APPKEY = "Your APPKEY" // To obtain an Appkey, go to the console: https://nls-portal.console.aliyun.com/applist
TOKEN = "Your TOKEN" // For details about how to obtain a token, see https://help.aliyun.com/document_detail/450514.html
)
func onTaskFailed(text string, param interface{}) {
logger, ok := param.(*nls.NlsLogger)
if !ok {
log.Default().Fatal("invalid logger")
return
}
logger.Println("TaskFailed:", text)
}
func onStarted(text string, param interface{}) {
logger, ok := param.(*nls.NlsLogger)
if !ok {
log.Default().Fatal("invalid logger")
return
}
logger.Println("onStarted:", text)
}
func onResultChanged(text string, param interface{}) {
logger, ok := param.(*nls.NlsLogger)
if !ok {
log.Default().Fatal("invalid logger")
return
}
logger.Println("onResultChanged:", text)
}
func onCompleted(text string, param interface{}) {
logger, ok := param.(*nls.NlsLogger)
if !ok {
log.Default().Fatal("invalid logger")
return
}
logger.Println("onCompleted:", text)
}
func onClose(param interface{}) {
logger, ok := param.(*nls.NlsLogger)
if !ok {
log.Default().Fatal("invalid logger")
return
}
logger.Println("onClosed:")
}
func waitReady(ch chan bool, logger *nls.NlsLogger) error {
select {
case done := <-ch:
{
if !done {
logger.Println("Wait failed")
return errors.New("wait failed")
}
logger.Println("Wait done")
}
case <-time.After(20 * time.Second):
{
logger.Println("Wait timeout")
return errors.New("wait timeout")
}
}
return nil
}
var lk sync.Mutex
var fail = 0
var reqNum = 0
func testMultiInstance(num int) {
pcm, err := os.Open("tests/test1.pcm")
if err != nil {
log.Default().Fatalln(err)
}
buffers := nls.LoadPcmInChunk(pcm, 320)
param := nls.DefaultSpeechRecognitionParam()
config, _ := nls.NewConnectionConfigWithAKInfoDefault(nls.DEFAULT_URL, APPKEY, AKID, AKKEY)
var wg sync.WaitGroup
for i := 0; i < num; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
strId := fmt.Sprintf("ID%d ", id)
logger := nls.NewNlsLogger(os.Stderr, strId, log.LstdFlags|log.Lmicroseconds)
logger.SetLogSil(false)
logger.SetDebug(true)
logger.Printf("Test Normal Case for SpeechRecognition:%s", strId)
sr, err := nls.NewSpeechRecognition(config, logger,
onTaskFailed, onStarted, onResultChanged,
onCompleted, onClose, logger)
if err != nil {
logger.Fatalln(err)
return
}
test_ex := make(map[string]interface{})
test_ex["test"] = "hello"
for {
lk.Lock()
reqNum++
lk.Unlock()
logger.Println("SR start")
ready, err := sr.Start(param, test_ex)
if err != nil {
lk.Lock()
fail++
lk.Unlock()
sr.Shutdown()
continue
}
err = waitReady(ready, logger)
if err != nil {
lk.Lock()
fail++
lk.Unlock()
sr.Shutdown()
continue
}
for _, data := range buffers.Data {
if data != nil {
sr.SendAudioData(data.Data)
time.Sleep(10 * time.Millisecond)
}
}
logger.Println("send audio done")
ready, err = sr.Stop()
if err != nil {
lk.Lock()
fail++
lk.Unlock()
sr.Shutdown()
continue
}
err = waitReady(ready, logger)
if err != nil {
lk.Lock()
fail++
lk.Unlock()
sr.Shutdown()
continue
}
logger.Println("Sr done")
sr.Shutdown()
}
}(i)
}
wg.Wait()
}
func main() {
coroutineId := flag.Int("num", 1, "coroutine number")
flag.Parse()
log.Default().Printf("start %d coroutines", *coroutineId)
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
go func() {
for range c {
lk.Lock()
log.Printf(">>>>>>>>REQ NUM: %d>>>>>>>>>FAIL: %d", reqNum, fail)
lk.Unlock()
os.Exit(0)
}
}()
testMultiInstance(*coroutineId)
}