High Availability Scenarios for UCMA Applications

This blog post describes 4 high availability scenarios for middle tier UCMA application developers in a Microsoft Lync Server environment. Description starts from the most basic scenario which actually does not guarantee any high availability to the most advanced one which supports multiple datacenters. Each scenario is built upon the previous one.

Scenario A: no high availability

The following picture shows the most basic scenario.

As the picture shows, the scenario includes the following components:

  • Lync Front End Server(s): it can be a single Standard Edition Server or an Enterprise Edition pool with multiple Front End Servers. From the current perspective, it does not matter which one;
  • UCMA Application Server: it hosts a middle tier UCMA application which can be either a Windows service or a web application running on IIS;
  • Backend database: used by the UCMA application to store business critical or long lived, infrequently changing data (e.g. call start time, call end time, caller id in case of a contact center application);
  • State: transient or frequently changing data maintained by the UCMA application in memory (e.g. current agent status, agent session in case of a contact center application). There is no need to store transient, temporary data to the backend database. It would just waste storage, network bandwidth and computing resources. Moreover, it is not a good idea to store frequently changing data in the backend database because it easily introduces bottleneck on the backend database side. Such bottleneck could prevent the application to scale out. You can easily reach a point where you are not able to add additional application servers to the environment. So, transient or frequently changing data is maintained by the UCMA application in memory. This eliminates the bottleneck on the backend database, it makes the application scalable but results in a stateful application;
  • Client Application: application used by the end user. It can be a traditional desktop based application or a thin client application running in browser;

Drawback: If the application server goes down, or becomes unavailable then the end user cannot work.

Scenario B: high availability with active-passive servers

The following scenario introduces multiple UCMA application servers.

  • At least 2 application servers are installed. Servers are organized into a trusted application pool (New-CsTrustedApplicationPool, New-CsTrustedApplicationComputer cmdlets in Lync Server Management Shell). Each application server hosts the same UCMA application. Each application uses the same UCMA application endpoints (same SIP URIs);
  • There is no mechanism to synchronize UCMA application states between different application servers. Thus each application server maintains its own internal state;
  • At any given time only one application server is active, the other ones are passive. This can be guaranteed by implementing an application level witness mechanism. Only the active server starts UCMA server platform up (CollaborationPlatform.BeginStartup() UCMA method) and registers endpoints (ApplicationEndpoint. BeginEstablish() UCMA method). The other ones do not do that. So, at any time a given application endpoint is registered only from one application server (from the active server);
  • When the active application server goes down or becomes unavailable then one of the passive servers takes over its role automatically and becomes active (as part of the witness mechanism);
  • Manual failback is supported by the application;
  • There is a built-in logic on the client application to discover the active UCMA application server;

Drawback: when failover/failback occurs, the originally active application server’s internal state is lost. This has the consequence that the client application used by the end user might get out of synch. End user might need to sign out/sign in to have the application in synch again.

Moreover, this solution cannot scale out since a single application server (the active one) needs to be able to serve all the end users.

Benefit: it is quite easy to implement and it protects against application server outage.

Scenario C: high availability with active-active servers

Next let us extend the previous scenario to offer active-active servers.

  • As in the previous scenario, multiple application servers are available. However, each application server is active now. Each UCMA application instance registers the same application endpoints. So a given application endpoint is registered from multiple application servers at the same time;
  • There is a distributed shared cache to synchronize UCMA application states between application servers (e.g. Velocity from Microsoft; part of AppFabric now);
  • Client application connects to the least loaded application server and works with that server until the server becomes unavailable;
  • Client application connects to another application server when the previous active server goes down or becomes unavailable. When such an event occurs, client application GUI can be kept in synch since application server’s internal states are never lost.
  • Please note that in this scenario it can easily happen that while the client application is connected to UCMA Application Server A the incoming Lync call connected to the end user is managed by UCMA Application Server B in the background because Lync Front End originally routed the incoming call to UCMA Application Server B during call setup. In order to work, either Application Server A acts as a proxy to forward each application level request coming from the client application to Application Server B or Application Server B simply reroutes the call to Application Server A as part of call setup.

Benefit: evenly loaded application servers (the same number of clients are connected) and protection against application server loss. The solution can easily scale out; in order to serve more end users, you just need to install additional application servers.

Scenario D: high availability with datacenter failover

Finally, the most advanced scenario spans multiple datacenters.

  • Multiple datacenters are available. Different application endpoints (SIP URI) are configured in different datacenters;
  • At any given time only one datacenter is active, the other one is passive;
  • Manual mechanism is available to initiate UCMA application failover, failback between datacenters. Having no automatic failover/failback for the UCMA application is not a problem since Lync failover/failback between datacenters is also a manual (and time consuming) process;
  • Depending on what kind of UCMA application we are talking about (e.g. contact center, billing application, recording), some data might require synchronization between the backend databases located in different datacenters;
  • Client application always connects to the active datacenter;

Drawback: the active application server's internal state is lost when a failover, failback occurs. From the end user’s perspective, the consequences are the same as described in scenario B.

Benefit: protection against datacenter outage.