Answering Machine Detection

Answering machine detection (AMD) is a mechanism used by dialers to guess whether an outbound call is human answered or connected to voicemail. Having an effective AMD mechanism is really important for outbound dialers since a significant portion of the calls might be connected to voicemail. Disconnecting these calls instead of routing them to agents can save a lot of time and work.

It is surprising but in the 21st century still there is no standardized, widely deployed signaling protocol in communications networks to notify callers if call is connected to answering machine. To detect answering machines dialers usually listen into media streams and try to figure out whether a human or an automated system is on the other end of the call.

AMD implementations are generally based on the observation that a human answered call usually starts with a relatively short period (4-5 secs) of voice activity (e.g. “Hi, this is Michael”) and continues with pause while voicemail starts with much longer continuous activity (e.g. “Hi, this is Michael Smith’s voicemail. I am not available right now. Please leave a message after the tone”). Such heuristics are questionable, but as I mentioned before there is no other way to detect answering machines.

Detection methods should guarantee short decision time (hold time) and low rate of false positive decisions (abandoned rate). In order to prevent harassing customer, both the hold time and the abandoned rate are generally governed by national regulations. These regulations are different from countries to countries. There are countries where regulations are so strict that effectively makes impossible to deploy any AMD. E.g. Ofcom regulation in United Kingdom specifies that hold time and abandoned rate cannot be greater than 2 seconds and 3%, respectively.

Speech engine capability used

UCMA speech engine (Microsoft.Rtc.Collaboration.AudioVideo.Recorder) offers a subscription based model for applications to receive notifications when voice activity changes in audio media streams. Based on this capability, it is quite easy to implement the above mentioned AMD heuristic:

  • Answering machine: if continuous voice activity longer than a predefined duration is detected at the beginning of the call

  • Human answered: otherwise

Answering machine detector

The following code snippet shows how an implementation might look like. You need to setup and application endpoint and dial customer. Before dialing, you should subscribe for StateChanged and AudioVideoFlowConfigurationRequested events. Then you need to invoke the AttachFlow() method on the AMD detector immediately after audio flow is created on the call. Finally, you should start the detection by invoking StartDetection() when call is established.

public enum AMDResult
{
HUMAN, MACHINE
}

public delegate void AMDFinishedEventHandler(object sender, AMDFinishedEventArgs e);
private AnsweringMachineDetector amd = new AnsweringMachineDetector(3000, 5000);
private ApplicationEndpoint endPoint = ;
private AudioVideoCall call = null;

private void DialCustomer()
{
call = new AudioVideoCall(new Conversation(endPoint));

call.AudioVideoFlowConfigurationRequested += OnAudioFlowCreated;
call.StateChanged += OnCallStateChanged;

call.BeginEstablish("tel:+123456789", null, FinishCallEstablish, null);
}

private void FinishCallEstablish(IAsyncResult ar)
{
call.EndEstablish(ar);
}

private void OnAudioFlowCreated(object sender, AudioVideoFlowConfigurationRequestedEventArgs args)
{
if (args.Flow != null)
{
amd.AttachFlow(args.Flow);
}
}

private void OnCallStateChanged(object sender, CallStateChangedEventArgs args)
{
if (args.State == CallState.Established)
{
amd.StartDetection(new AMDFinishedEventHandler(OnAMDFinished));
}
}

private void OnAMDFinished(object sender, AMDFinishedEventArgs e)
{
// AMD finished result: e.Result
}

Detector class looks like as follows. It accepts 2 parameters in constructor: idle timeout and talk duration. The first parameter is to stop detector if no voice activity is detected in a given time frame. The second parameter specifies the key parameter to decide whether human or answering machine is on the other side. Detector assumes call is connected to answering machine if it starts with voice activity longer than the talkDuration. Otherwise, it assumes the call is human answered.

When AttachFlow() is invoked it sets up a UCMA speech detector and subscribes for VoiceActivityChanged events. StartDetection() method starts the detection process. It uses the parameters received in constructor and activity events received from the UCMA speech engine. It raises an AMDResult event when detection is finished.

class AnsweringMachineDetector
{
private enum AMDStatus
{
NULL, IDLE, TALKING
}

private event AMDFinishedEventHandler AMDFinished;

private AMDStatus status = AMDStatus.NULL;
private Recorder recorder = null;
private bool isVoice = false;
private System.Timers.Timer idleTimer = null;
private System.Timers.Timer talkTimer = null;

public AnsweringMachineDetector(long idleTimeout, long talkDuration)
{
idleTimer = new System.Timers.Timer(idleTimeout);
idleTimer.AutoReset = false;
idleTimer.Elapsed += OnIdleTimeout;

talkTimer = new System.Timers.Timer(talkDuration);
talkTimer.AutoReset = false;
talkTimer.Elapsed += OnTalkTimeout;
}

public void AttachFlow(AudioVideoFlow flow, string tempFile)
{
recorder = new Recorder();
recorder.AttachFlow(flow);
recorder.SetSink(new WmaFileSink(tempFile));
recorder.VoiceActivityChanged += OnVoiceActivityChanged;
recorder.Start();
}

public void StartDetection(AMDFinishedEventHandler eventHandler)
{
AMDFinished += eventHandler;
ChangeStatus(isVoice ? AMDStatus.TALKING : AMDStatus.IDLE);
}

private void OnVoiceActivityChanged(object sender, VoiceActivityChangedEventArgs e)
{
isVoice = e.IsVoice;

if ((status == AMDStatus.IDLE) && e.IsVoice)
{
ChangeStatus(AMDStatus.TALKING);
}
else if ((status == AMDStatus.TALKING) && (!e.IsVoice))
{
FireAMDFinished(AMDResult.HUMAN);
}
}

private void ChangeStatus(AMDStatus st)
{
switch(status = st)
{
case AMDStatus.IDLE:
idleTimer.Start();
break;

case AMDStatus.TALKING:
talkTimer.Start();
break;
}
}

protected virtual void FireAMDFinished(AMDResult result)
{
if (AMDFinished != null)
{
AMDFinished(this, new AMDFinishedEventArgs(result));
}
}

private void OnIdleTimeout(object source, ElapsedEventArgs e)
{
FireAMDFinished(AMDResult.HUMAN);
}

private void OnTalkTimeout(object source, ElapsedEventArgs e)
{
FireAMDFinished(AMDResult.MACHINE);
}
}

Just 3 comments regarding the detector implementation:

  • It attaches speech detector to the audio flow as soon as the flow is available but starts detection only when call is established. The reason of doing that is the following: according to our observation it takes 2-3 seconds for the UCMA speech detector to raise the first VoiceActivityChanged event. Attaching speech detector to the audio flow even before call is Established mitigates this delay issue.
  • UCMA speech detector raises VoiceActivityChanged event only if audio stream is recorded to file. So you need to specify a temporary file. You can remove this file after detection is done.
  • VoiceActivityChanged event has an attribute to indicate the relative time the event belongs. Our detector implementation does not use this. Mainly because, its usability depends on a lot of factor (e.g. on the load on the application server or the way the entire application code is written). 

AMDResult arguments are defines as follows.

public class AMDFinishedEventArgs : EventArgs
{
private AMDResult result = AMDResult.HUMAN;

public AMDResult Result
{
get
{
return result;
}
}

public AMDFinishedEventArgs(AMDResult result)
{
this.result = result;
}
}