Resource Requirement for UCMA Based Call Recording

Lync Server 2013 does not have call recording functionality. Although, users having Lync 2013 client can record calls but it is a purely client side functionality and because of this it has very limited relevance in business environment. Lync users can record their own calls but there is no centrally manageable, rule based, built-in server side call recording functionality which can be used for quality assurance or legal purposes.

Looking at the list if 3rd party call recording and contact centers solutions qualified for Lync 2013, there are 3 ways how calls are recorded in Lync Server 2013 environment:

  • Port mirroring on network switches: each network switch is configured to send a copy of each packet to a network monitoring port. Recorder sits behind this network monitoring port and collects SRTP packets. The main drawbacks of this solution are the sensitivity to the changes in network topology and the painful switch configuration (especially in heterogenic switch environment).
  • Packet sniffing on Lync Servers: packet capturing module is installed on Lync AV Edge and Mediation Servers. The packet capturing module works in OSI Layer 2 to capture packets and forward SRTP packets to a call recorder server. These type of call recording solutions can record only a small subset of Lync calls. It is able to record PSTN calls (require media bypass to be disabled) and calls between internal users and external ones. And that is all; it cannot record anything else. For example, it is not able to record calls between internal Lync users and cannot record calls between external ones. Further disadvantage is that 3rd party recorder components (packet capturing modules) are installed on Lync Server and the Windows Servers/networking should be sized accordingly. Especially from host CPU and network bandwidth perspective.
  • UCMA based conferencing: trusted UCMA application joins and dials into an existing conference and uses Microsoft.Rtc.Collaboration.AudioVideo.Recorder to write media stream to the file system. This type of call recording is used by the contact center solutions which apply conferencing in the background. Obviously, it is able to record only those calls which are handled by the contact center software. This class of call recording cannot be considered as an enterprise wide recording solution. Instead, it is a built-in feature of the contact center solution.

Recording solutions based on port mirroring and packet capturing require special filters to be installed on each Lync Front End server in order to capture SIP INVITE messages and extract AES encryption keys from them. Then these keys are used to decrypt the SRTP streams captured. No filters are required for UCMA based recording.

There are recording solutions which - besides extracting encryption keys from SIP INVITE - also change candidate list in SDP in order to redirect media flows to recorder servers. Then the recorder server not only record media packets but also acts like a media proxy in both directions to deliver media packets between the real endpoints. This might be a very risky solution since it affects the network path media packets need to travel, it might change the real-time behavior of the media and might significantly degrade user experience if recorder servers are not sized correctly. 

This blog post is about the UCMA based call recording solution and focuses on the CPU, disk and memory requirements.

UCMA based call recording

First of all let us overview how UCMA based call recording works:

  1. An UCMA application endpoint (Microsoft.Rtc.Collaboration.ApplicationEndpoint) is registered by a trusted UCMA application. This trusted UCMA application will be referred as the recorder service;
  2. The recorder service is instructed to record a given conversation. To do that the recorder service receives a conference URI in an application specific way;
  3. The recorder service joins to the conference;
  4. The recorder service sets up an AudioVideoCall object (Microsoft.Rtc.Collaboration.AudioVideo namespace) and dials into the conference;
  5. When the media flow belonging to the AudioVideoCall becomes active, a Recorder object (Microsoft.Rtc.Collaboration.AudioVideo namespace) is attached to the media flow by the recorder service. At the time the Recorder is attached to the media flow, the recorder service specifies a file to write the output. It is generally a wma file;
  6. Behind the scenes the application endpoint receives the encrypted media stream (SRTP) from the Lync Server audio/video MCU and decrypts that;
  7. The Recorder transcodes the G.722 media stream to WMA2 and writes the output to the specified wma file;

So, behind the scenes there is an extensive decryption and transcoding procedure and disk write activity. Our main objective here is to investigate the resource requirement of this recording process and determine how much resources are required to record a given number of calls at the same time. We focus on memory, CPU and disk IO requirements.

Resource requirement

The following table shows average values calculated over an extensive data set collected from a production environment. The table below shows only the first few lines in order to demonstrate the trends in the values of the different performance counters.

Number of calls recorded at the same time Process memory usage (MB) Process CPU usage (%) Process IO write (bytes/sec)
1 119.58 3.03 3253.69
2 122.87 5.22 5941.67
3 124.39 7.53 8314
4 125.28 8.04 9198.29
5 128.28 9.7 11321.79
6 130.83 11.49 13405.13
7 132.3 13.8 16330.89
8 134.58 14.53 17598.21
9 136.58 15.97 19327.61
10 139.14 17.37 21950.92
11 141.51 19.72 25054.57
12 144.35 21.13 26422.16
13 147.38 21.58 27482
14 144.63 22.48 28378.56
15 145.54 24.66 30222.11
16 150.77 28.65 35733.6
17 158.5 28.69 37147
... ... ... ...

Based on the data the followings conclusions can be drawn:

  • Both memory, CPU and disk IO usage increase linearly. Which is a good property;
  • Memory usage increases slightly as more and more calls are recorded. But memory usage itself is not significant. A Windows Server with 8 GB memory would allow recording thousands of calls;
  • Surprisingly, disk IO usage is not too high (thanks to the G.722 and WMA2 compression rates). Average IO write activity is 3-4 kbyte/sec/call. A Windows Server with a few 8000 RPM disk drives each having 40-50 Mbytes/sec write speed would allow recording thousands of calls;
  • CPU usage is significant;

The following visualizes the CPU usage and the associated trend.


As described above, CPU usage is mainly originated from decrypting the G.722 SRTP stream, transcoding that to WMA2 and writing output to the disk.

The CPU usage information displayed above came from the "Process"/"% Processor Time" performance counter of a Windows Server with an Intel Xeon 2.13 GHz, 4 Core CPU. Please note that the value of the "Process"/"% Processor Time" performance counter can go up to N*100 (where N is the number of CPU cores) because it adds up the CPU usage of the requested process across all the CPU cores.

Based on the CPU usage information above we can easily calculate how many calls can be recorded on this Windows Server at the same time considering 80% utilization on each CPU core as a limit:

X * 28.69% / 4 / 17 = 80% => X = 190 calls

This means that maximum 190 calls can be recorded on a Windows Server with an Intel Xeon 2.13 GHz, 4 Core.


Considering only the memory and IO requirement you could record multiple thousands of calls at the same time on a Windows Server with 8 GB RAM, a few 8000 RPM disks and Intel Xeon 2.13 GHz, 4 Core CPU. However, because the recording process is quite CPU intensive you can record only 190 calls at the same time. If you need to record more calls at the same time then you need to design the recording service to be scalable and distribute that to multiple machines.