Application Recovery in Paired Front End Pool Environment

Lync Server 2013 supports paired Front End Pool environment for disaster recovery purposes. Such environment includes 2 Lync Front End pools and a pairing relationship between them. The pools are generally Lync Server Enterprise Edition pools and they are generally installed in different geographic locations. If one of the Front End pools goes down or becomes unavailable then Lync Server administrators can turn to the Lync Server Management Shell and manually initiate a failover procedure (Invoke-CsManagementServerFailover, Invoke-CsPoolFailover). After this is done, all services are available for Lync users again but now the services are provided by the other Front End pool.

That is great. But what about UCMA applications? In short: when performing the above described disaster recovery procedure application endpoints are left on their original Front End pool. They are not moved to the other one. You need to perform further manual configuration in order to recover the services provided by your UCMA application. This blog post talks about the manual configuration required and describes what can be done on the application side in order to minimize the downtime.

To be able to focus on the main subject, the description below purposely omits some details especially around the Edge Server (e.g. Edge <=> Front End relationship).

Application in a paired Front End pool environment

Lync Server Response Group - which is a UCMA application released as part of the Lync Front End Server - also needs a manual recovery procedure in paired Front End pool environment. So it is not surprising that you need to do something similar for 3rd party UCMA applications as well. But let us see why such manual recovery procedure is required for UCMA applications.

If you are familiar with the Lync Server Management Shell then you probably know that you need to specify a Lync Server Registrar FQDN when you create a new trusted application pool for your UCMA application. E.g.

New-CsTrustedApplicationPool –Identity apppool1.contoso.com –Registrar fepool1.contoso.com –Site 1

The Registrar parameter is the FQDN of one of your Lync Front End pools. You can specify only a single registrar; there is no way to specify backup registrar. Thus you directly tie your application pool to a specific Front End pool. Moreover, you indirectly tie your UCMA application and application endpoints to the same Front End pool when you create a trusted application and application endpoints (executing New-CsTrustedApplication and New-CsTrustedApplicationEndpoint respectively).

After creating a trusted application pool, a trusted application and a trusted application endpoint for your UCMA application in a paired Front End pool environment then you have something like shown in the following figure.

As the figure shows you have

  • 2 Front End pools (fepool1.contoso.com, fepool2.contoso.com)
  • a UCMA application pool (apppool1.contoso.com), an associated application (urn:application:app1) and an application endpoint (sip:endpoint1@contoso.com). All of them are tied to one of the Front End pools (fepool1.contoso.com in our case)
  • 3 different types of callers: Lync callers, PSTN callers and probably Skype callers if the application endpoint is exposed toward the Skype community

This solution works fine until the Front End pool the UCMA application is tied to (fepool1.contoso.com) is available and works as expected. But let us see what happens when a disaster occurs meaning that fepool1.contoso.com either stops working or becomes unavailable.

When disaster happens

If fepool1.contoso.com is lost then Lync Server administrators need to perform a manual disaster recovery procedure (Invoke-CsManagementServerFailover, Invoke-CsPoolFailover) in order to restore services for Lync users on the other pool. When it is done, Lync users are connected to the Lync Front End pool fepool2.contoso.com and they can use Lync services again. But what to do with the UCMA application which is still unavailable for callers since its registrar fepool1.contoso.com is currently down?

As the following figure shows, even if you manually configure the UCMA application to connect to the Front End pool fepool2.contoso.com it does not help. This is because registration requests will be redirected to its registrar and the registrar is unavailable.

Moving the application endpoints between Front End pools

A solution to overcome the issue is to turn to the Lync Server Management Shell and move your trusted application pool, trusted application and application endpoint to registrar on fepool2.contoso.com. In order to do that you can remove the existing trusted application pool, trusted application and application endpoint first (Remove-CsTrustedApplicationEndpoint, Remove-CsTrustedApplication, Remove-CsTrustedApplicationPool) and then create exactly the same pool, application and endpoint again but now specifying fepool2.contoso.com as the registrar (New-CsTrustedApplicationPool, New-CsTrustedApplication, New-CsTrustedApplicationEndpoint). Do not forget to use the original LineURI when you execute New-CsTrustedApplicationEndpoint.

At the end you will have a UCMA application running again but now connected to the FE pool fepool2.contoso.com.

As the figure shows, your UCMA application becomes available only for Skype and PSTN callers. However, Lync users who previously added the endpoint to their contact lists will see “Presence unknown”. This is because the endpoint and the associated AD contact object in the background have been recreated. Lync users need to remove the original contact from the contact list and add a new contact again with the same SIP URI. So moving application endpoints between Front End pools causes some headache for Lync users.

It is worth to note that some suggests on the Internet that a trusted application pool can be moved between registrars without recreating the associated endpoints:

Set-CsTrustedApplicationPool –Identity apppool1.contoso.com –Registrar fepool2.contoso.com

Move-CsApplicationEndpoint –Identity sip:endpoint1@contoso.com –TargetApplicationPool apppool1.contoso.com –Force

But this never worked for me.

Having a backup UCMA application

Anyway, you can connect your UCMA application to the new Front End pool if you recreate the application pool, application and endpoint as described above. However, this is time consuming and results in a significant downtime. As a more sophisticated solution you can create 2 different trusted application pools, applications and endpoints at the beginning. One of them uses the registrar fepool1.contoso.com. The other one uses the registrar fepool2.contoso.com. Then as the figure below shows, you can deploy 2 instances from the same UCMA application and connect them to proper Front End pools. Thus you will have 2 applications running all the time. When fepool1.contoso.com goes down, all you have to do is to execute

$x = Get-CsTrustedApplicationEndpoint –Identity sip:endpoint1.contoso.com

Set-CsTrustedApplicationEndpoint –Identity sip:endpoint1.contoso.com –LineURI $null

Set-CsTrustedApplicationEndpoint –Identity sip:endpoint2.contoso.com –LineURI $x.LineURI

in order to redirect PSTN callers to the UCMA application instance using fepool2.consoto.com as registrar.

So you have a really minimal downtime for PSTN callers when Lync pool disaster happens. However, Skype and Lync users who added sip:endpoint1@contoso.com to their contact list will obviously see “Offline”. Lync users need to add sip:endpoint2@contoso.com to their contact list as well. Regarding Skype users, the only manageable solution is to expose the online endpoint (either sip:endpoint1@contoso.com or sip:endpoint2@contoso.com) on your web site (e.g. “click to call” link).

Application specific disaster recovery procedure

One more thing … If your UCMA application is a stateful solution meaning that it maintains Lync user and application endpoint specific information then you need to specifically design your application to fit to a paired Front End pool environment. Among others, this means that you need to offer some command line or GUI for administrators to initiate application specific failover procedure which moves state information - and probably some configuration data - between the application instances. Actually, Lync Front End pools in a paired pool environment do similar things when you execute the cmdlets Invoke-CsManagementServerFailover and Invoke-CsPoolFailover in Lync Server Management Shell.

figure5.png