Use the ARTC SDK to capture audio from custom sources instead of the built-in capture module.
Overview
The ARTC SDK's built-in audio module covers most use cases, but custom audio capture is needed in scenarios such as:
-
When an audio capture device is occupied by another process.
-
To capture audio from a custom source, like a proprietary system or an audio file, and send it to the SDK.
Custom audio capture lets you manage your own audio devices and sources.
Sample code
Android: Android/ARTCExample/AdvancedUsage/src/main/java/com/aliyun/artc/api/advancedusage/CustomAudioCaptureAndRender/CustomAudioCaptureActivity.java.
iOS: iOS/ARTCExample/AdvancedUsage/CustomAudioCapture/CustomAudioCaptureVC.swift.
Harmony: Harmony/ARTCExample/entry/src/main/ets/pages/advancedusage/CustomAudioCapturePage.ets.
Prerequisites
Before you begin, ensure you have completed the following:
-
Created an Alibaba Real-Time Communication (ARTC) application and obtained an App ID and App Key from the ApsaraVideo Live console. For instructions, see Create an application.
-
Integrated the ARTC SDK into your project and implemented basic real-time audio and video calling. For instructions, see Download and integrate the ARTC SDK and Implement an audio/video call.
Implementation
1. Enable or disable internal capture
To use custom audio capture, first disable the SDK's internal capture module by passing the extras parameter when calling getInstance to create the engine:
user_specified_use_external_audio_record: Disables the SDK's internal capture to enable custom audio capture.
-
"TRUE": Use custom audio capture (disables internal capture). -
"FALSE": Do not use custom audio capture (enables internal capture).
The extras parameter is a JSON string.
Android
String extras = "{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.getInstance(this, extras);
iOS
// Create and initialize the engine.
var customAudioCaptureConfig: [String: String] = [:]
// Use custom audio capture.
customAudioCaptureConfig["user_specified_use_external_audio_record"] = "TRUE"
// Serialize to JSON.
guard let jsonData = try? JSONSerialization.data(withJSONObject: customAudioCaptureConfig, options: []),
let extras = String(data: jsonData, encoding: .utf8) else {
print("JSON serialization failed")
return
}
let engine = AliRtcEngine.sharedInstance(self, extras:extras)
Mac
NSString * extras = @"{\"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = [AliRtcEngine sharedInstance:self extras:extras];
Windows
/* Windows supports enabling or disabling audio capture during engine creation. */
/* Disable internal capture. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"TRUE\", \"user_specified_use_external_audio_record\":\"TRUE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);
/* Enable internal capture. */
char* extra = "{\"user_specified_enable_use_virtual_audio_device\":\"FALSE\", \"user_specified_use_external_audio_record\":\"FALSE\"}";
mAliRtcEngine = AliRtcEngine.Create(extra);
Harmony
// Create an RTC engine instance and enable custom audio capture.
const extras = '{"user_specified_use_external_audio_record":"TRUE"}';
this.rtcEngine = AliRtcEngine.getInstance(extras, this.context);
2. Add an external audio stream
Call addExternalAudioStream to add an external audio stream and obtain its stream ID. To enable 3A audio processing (acoustic echo cancellation, automatic gain control, and noise suppression), set the enable3A parameter in the AliRtcExternalAudioStreamConfig object.
When to call this method:
-
If you need to use 3A, we recommend that you call
addExternalAudioStreamafter the audio stream is successfully published and the custom capture module obtains the first audio frame. That is, you should call the method after theonAudioPublishStateChangedinterface returnsnewStateasAliRtcStatsPublished (3). -
If you do not need 3A audio processing (for example, when streaming audio from a local file, network source, or TTS-generated data): You can call this method immediately after creating the engine. Then, start pushing audio data once the stream is published.
Android
AliRtcEngine.AliRtcExternalAudioStreamConfig config = new AliRtcEngine.AliRtcExternalAudioStreamConfig();
config.sampleRate = SAMPLE_RATE; // Sample rate
config.channels = CHANNEL; // Number of channels
// Publish volume
config.publishVolume = 100;
// Local playout volume
config.playoutVolume = isLocalPlayout ? 100 : 0;
config.enable3A = true;
int result = mAliRtcEngine.addExternalAudioStream(config);
if (result <= 0) {
return;
}
// The return value is the stream ID. You need it to push data to the SDK.
mExternalAudioStreamId = result;
iOS
/* Set parameters based on your application's needs. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
// This must match the number of channels of the external PCM audio stream. Set to 1 for mono or 2 for stereo.
config.channels = _pcmChannels;
// This must match the sample rate of the external PCM audio stream.
config.sampleRate = _pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
_externalPlayoutStreamId = [self.engine addExternalAudioStream:config];
Mac
/* Set parameters based on your application's needs. */
AliRtcExternalAudioStreamConfig *config = [AliRtcExternalAudioStreamConfig new];
config.channels = pcmChannels;
/** Sample rate. Default: 48000. Supported values: 8000, 12000, 16000, 24000, 32000, 44100, 48000, 64000, 88200, 96000, 176400, 192000. */
config.sampleRate = pcmSampleRate;
config.playoutVolume = 0;
config.publishVolume = 100;
int ret = [self.engine addExternalAudioStream:config];
Windows
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
/* Set parameters based on your application's needs. */
AliEngineExternalAudioStreamConfig config;
config.playoutVolume = currentAudioPlayoutVolume;
config.publishVolume = currentAudioPublishVolume;
config.channels = 1;
config.sampleRate = 48000;
config.publishStream = 0;
audioStreamID = mAliRtcMediaEngine->AddExternalAudioStream(config);
mAliRtcMediaEngine->Release();
Harmony
// Configure external audio stream parameters.
const config: AliRtcExternalAudioStreamConfig = new AliRtcExternalAudioStreamConfig();
config.sampleRate = this.audioMicrophoneSampleRate;
config.channels = this.audioMicrophoneChannel;
config.publishVolume = 100; // Publish volume
config.playoutVolume = this.IsAudioFrame ? 100 : 0; // Local playout volume
config.enable3A = true; // Enable 3A audio processing.
const result = this.rtcEngine.addExternalAudioStream(config);
if (result <= 0) {
console.error(`Failed to add external audio stream. Error code: ${result}`);
promptAction.showToast({ message: 'Failed to start audio capture', duration: 2000 });
return;
}
// The external audio stream is configured. Save the external audio stream ID.
this.externalAudioStreamId = result;
3. Implement a custom audio capture module
You must implement the logic to capture and process audio data, then send it to the SDK.
Alibaba Cloud provides a custom capture sample that demonstrates how to read PCM-formatted data from a local file or a microphone.
4. Push audio data to the SDK
After the audio stream is published (the onAudioPublishStateChanged callback reports AliRtcStatsPublished), call pushExternalAudioStreamRawData with the stream ID obtained in Step 2 to send captured audio data to the SDK. Convert the audio data into an AliRtcAudioFrame object with the following properties:
-
data: The audio data.
-
numSamples: The number of sample points per channel in the data provided.
-
bytesPerSample: Bytes per sample point (bit depth / 8). For example, this value is 2 for 16-bit audio.
-
numChannels: Number of audio channels.
-
samplesPerSec: The sample rate, in Hz (e.g., 16000 or 48000).
-
You must start pushing data only after the audio stream is published (when the
onAudioPublishStateChangedcallback reports a state ofAliRtcStatsPublished). -
Set the
numSamplesparameter in theAliRtcAudioFrameobject to the actual length of the captured data. Methods likeAudioRecord.readmay return less data than the buffer size, so you must use the method's return value to determine the actual data length. -
The
pushExternalAudioStreamRawDatacall can fail if the internal buffer is full. Your application must handle this error and implement a retry mechanism. -
We recommend calling
pushExternalAudioStreamRawDataevery 10 ms to send data.
Android
// Assume the captured audio data is in `audioData`, the size is `bytesRead` bytes, and it represents 10 ms of data.
if (mAliRtcEngine != null && bytesRead > 0) {
// Construct an AliRtcAudioFrame object. `bitsPerSample` is the bit depth, which is typically 16.
AliRtcEngine.AliRtcAudioFrame sample = new AliRtcEngine.AliRtcAudioFrame();
sample.data = audioData;
sample.numSamples = bytesRead / (channels * (bitsPerSample / 8)); // Calculate the number of samples based on the actual number of bytes read.
sample.numChannels = channels;
sample.samplesPerSec = sampleRate;
sample.bytesPerSample = bitsPerSample / 8;
int ret = 0;
// Retry the push operation if it fails because the buffer is full.
int retryCount = 0;
final int MAX_RETRY_COUNT = 20;
final int BUFFER_WAIT_MS = 10;
do {
// Push the captured data to the SDK.
ret = mAliRtcEngine.pushExternalAudioStreamRawData(mExternalAudioStreamId, sample);
if(ret == ErrorCodeEnum.ERR_SDK_AUDIO_INPUT_BUFFER_FULL) {
// Handle the buffer full scenario. Wait for a short period and retry.
retryCount++;
if(mExternalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
// The stream has been stopped or the maximum retry count is reached. Exit the loop.
break;
}
try {
// Pause for a short interval.
Thread.sleep(BUFFER_WAIT_MS);
} catch (InterruptedException e) {
e.printStackTrace();
break;
}
} else {
// Push succeeded or another error occurred. Exit the loop.
break;
}
} while (retryCount < MAX_RETRY_COUNT);
}
iOS
// Construct an AliRtcAudioFrame object from the captured audio data.
let sample = AliRtcAudioFrame()
sample.dataPtr = UnsafeMutableRawPointer(mutating: pcmData)
sample.samplesPerSec = pcmSampleRate
sample.bytesPerSample = Int32(MemoryLayout<Int16>.size)
sample.numOfChannels = pcmChannels
sample.numOfSamples = numOfSamples
var retryCount = 0
while retryCount < 20 {
if !(pcmInputThread?.isExecuting ?? false) {
break
}
// Push the audio data to the SDK.
let rc = rtcEngine?.pushExternalAudioStream(externalPublishStreamId, rawData: sample) ?? 0
// Handle a full buffer.
// 0x01070101 SDK_AUDIO_INPUT_BUFFER_FULL: The buffer is full. Retransmission is required.
if rc == 0x01070101 && !(pcmInputThread?.isCancelled ?? true) {
Thread.sleep(forTimeInterval: 0.03) // 30ms
retryCount += 1;
} else {
if rc < 0 {
"pushExternalAudioStream error, ret: \(rc)".printLog()
}
break
}
}
Mac
while ( true ) {
if (![pcmInputThread isExecuting]) {
push_error = YES;
break;
}
AliRtcAudioFrame *sample = [AliRtcAudioFrame new];
sample.dataPtr = pcmData;
sample.samplesPerSec = pcmSampleRate;
sample.bytesPerSample = sizeof(int16_t);
sample.numOfChannels = pcmChannels;
sample.numOfSamples = numOfSamples;
int rc = [self.engine pushExternalAudioStream:_externalPublishStreamId rawData:sample];
count = count + 1;
/* If the error is AliRtcErrAudioBufferFull, sleep for a moment and then continue pushing. */
if ( rc == AliRtcErrAudioBufferFull && [pcmInputThread isCancelled ] == NO ) {
[NSThread sleepForTimeInterval:0.04] ;
}else {
if ( rc < 0 ) {
push_error = true ;
}
break ;
}
}
Windows
Before you implement custom capture on Windows, you must call the QueryInterface method to get the media engine object.
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
// Construct an audio frame from the data.
AliEngineAudioRawData rawData;
rawData.dataPtr = frameInfo.audio_data[0];
rawData.numOfSamples = (int) (frameInfo.audio_data[0].length / (2 * frameInfo.audio_channels));
rawData.bytesPerSample = 2;
rawData.numOfChannels = frameInfo.audio_channels;
rawData.samplesPerSec = frameInfo.audio_sample_rate;
// Push the data to the SDK.
int ret = mAliRtcMediaEngine->PushExternalAudioStreamRawData(audioStreamID, rawData);
// Handle buffer full and other errors.
if ( ret == AliEngineErrorAudioBufferFull ) {
Sleep(40);
continue ;
}
// Release the media engine.
mAliRtcMediaEngine->Release();
Harmony
private pushAudioData(audioBuffer: ArrayBuffer, sampleRate: number, channels: number, bitsPerSample: number): void {
if (!this.rtcEngine || this.externalAudioStreamId <= 0) {
return;
}
try {
// Create an audio frame object.
const audioFrame: AliRtcAudioFrame = new AliRtcAudioFrame();
audioFrame.buffer = audioBuffer;
// Calculate the number of samples.
const bytesPerSample = bitsPerSample / 8;
const numSamples = audioBuffer.byteLength / (channels * bytesPerSample);
audioFrame.bytesPerSample = bytesPerSample;
audioFrame.numOfChannels = channels;
audioFrame.samplesPerSec = sampleRate;
audioFrame.numSamples = numSamples; // Calculate the number of samples based on the actual data.
// Push audio data and handle the buffer full scenario.
let result: number;
let retryCount = 0;
const MAX_RETRY_COUNT = 20;
const BUFFER_WAIT_MS = 10;
do {
result = this.rtcEngine.pushExternalAudioStreamRawData(this.externalAudioStreamId, audioFrame);
if (result === -1001) { // ERR_SDK_AUDIO_INPUT_BUFFER_FULL
retryCount++;
if (this.externalAudioStreamId <= 0 || retryCount >= MAX_RETRY_COUNT) {
console.warn('Stopping audio data push retry: stream ID is invalid or max retries reached.');
break;
}
// Wait for a short interval before retrying.
try {
setTimeout(() => {}, BUFFER_WAIT_MS);
} catch (e) {
console.error('Error while waiting to retry:', e);
break;
}
} else {
// Push succeeded or another error occurred. Exit the loop.
break;
}
} while (retryCount < MAX_RETRY_COUNT);
if (result !== 0 && result !== -1001) {
console.warn(`Failed to push audio data, error code: ${result}`);
} else if (result === 0) {
console.log(`Successfully pushed audio data, size: ${audioBuffer.byteLength} bytes`);
}
} catch (error) {
console.error('Exception while pushing audio data:', error);
}
}
5. Remove the external audio stream
To stop publishing audio from the custom source, call removeExternalAudioStream.
Android
mAliRtcEngine.removeExternalAudioStream(mExternalAudioStreamId);
iOS
[self.engine removeExternalAudioStream:_externalPublishStreamId];
Mac
[self.engine removeExternalAudioStream:_externalPublishStreamId];
Windows
/* Get the media engine. */
IAliEngineMediaEngine* mAliRtcMediaEngine = nullptr;
mAliRtcEngine->QueryInterface(AliEngineInterfaceMediaEngine, (void **)&mAliRtcMediaEngine);
mAliRtcMediaEngine->RemoveExternalAudioStream(audioStreamID);
mAliRtcMediaEngine->Release();
Harmony
this.rtcEngine.removeExternalAudioStream(this.externalAudioStreamId);
this.externalAudioStreamId = 0;
6. (Optional) Dynamically enable or disable internal capture
To dynamically enable or disable the SDK's internal capture during a call, use the setParameter method.
Android
/* Dynamically disable internal capture. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}";
mAliRtcEngine.setParameter(parameter);
/* Dynamically enable internal capture. */
String parameter = "{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}";
mAliRtcEngine.setParameter(parameter);
iOS
// Dynamically disable internal capture.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}")
// Dynamically enable internal capture.
engine.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}")
Mac
// Dynamically disable internal capture.
[self setParameter:@"{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}"];
// Dynamically enable internal capture.
[self setParameter:@"{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}"];
Windows
/* Dynamically disable internal capture. */
mAliRtcEngine->SetParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}");
/* Dynamically enable internal capture. */
mAliRtcEngine->SetParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}");
Harmony
// Dynamically disable internal capture.
this.rtcEngine?.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"FALSE\"}}");
// Dynamically enable internal capture.
this.rtcEngine?.setParameter("{\"audio\":{\"enable_system_audio_device_record\":\"TRUE\"}}")
FAQ
-
What is the recommended frequency for calling
pushExternalAudioStreamRawData?-
We recommend synchronizing the calls with the physical audio device's clock, calling the method each time the device provides a new data packet.
-
If no physical device clock is available, we recommend sending data every 10 to 50 ms.
-
-
Can I use the SDK's internal 3A audio processing (AEC, AGC, and ANS) with custom audio capture?
-
Yes. As described in Step 2, you can set the enable3A parameter when adding the external audio stream to enable or disable the SDK's internal 3A audio processing.
-