Deciding between Availability Group and Clustering - Episode IV: Software side of the story

Hello all - we hope all is well with your SQL server.

In this post - we're going to talk a bit about software side of the story when choosing between Clustering and Availability Group.

To recap, Clustering is an established technology in which you have two or more (typically two) servers and one shared storage, in which all the SQL action happens. Availability Group is a relatively new technology that evolved from what was called Database Mirroring, in whcih bunch of independent SQL servers with local storage somehow talk to one another to keep things consistent and give you high availablity that way.

Faced with the choice between Availability Group and Clustering in SQL 2014, people think a lot about cost, licensing, scalability, performance - all that stuff that has to do with finance and hardware. Of course that's important, but let's think a bit about software too, shall we?

Below, we have a little Availabilty Group setup. It has a little table that has identity column and a string. We wrote a little INSERT statement to put in the name of the server that's running it at the moment.

Now, after inserting several rows while node 2 (DEVSQL02) was the primary, we decided to initiate a shutdown of node 2. Availability Group works as intended, carrying on like nothing happened. Which is awesome. However, just look what happened to the identity column. It jumped by a thousand! How could this be?

Imagine yourself being the node1, the box that's on the receiving end of failover. Your buddy just fell, so you have to carry on for him. You have 99% current data and some log to restore on you, all the while answering all the queries pouring in.

Suppose, one of the queries asked, "Hey, what's the next identity number?"

What do you do? You know that the last number generated was 1031. Can you generate 1032? Well, no. You have some logs that are not restored yet because you were not the primary. Your dying buddy, in his last breath, might have generated ID=1032, 1033 and so on...which you have as yet-unrestored log file. So. In the transition period - the time between indicent happens and you become the primary - you have to play it safe. Say, adding 1000 sounds like safe thing to do, your dying buddy couldn't possibly have generated that many.

Technically it has to do with preallocation. Search that term, and also search "Trace Flag 272" if you want to read on. Why does this matter you ask? Well, if failover shuold be a healthy routine of any HA system, you might routinely switch over from one to the other for, say, installing windows update. What happens to your software architecture if the identity value jump by 1000 the whole time? You might run out of address space sooner then you like, if you have a scenario where SQL-generated ID goes on a 6-digit barcode or some portion of composit key in an integration scenario.

This, is less likely to happen with Clustering.

Suffice to say there are behavioral differences between the two. If you want your high availabilty to be hgihly transparent - as if the tech doesn't even exist seen from software - clsutering is still the way to go, because making independent and different SQL servers work as one is like herding cats. Availability Group is an awesome cat herder and it herds the cats with amazing grace and efficiency. But it's still herding cats. "Do I use a lot of IDENTITY columns and will I have issues with jumpy identity?" should be one of many questions you should ask In deciding Clustering vs Availability Group.