Author: Sanjay Mishra
Contributor: David P Smith (ServiceU)
Technical Reviewers: Charles (Curt) Mathews, James Podgorski, Eric Kang, Prem Mehra, Min He
It is common to have more than one SQL Server Failover Cluster Instance (FCI) within a single Windows Server Failover Cluster (WSFC). It is also fairly common that not all nodes in the WSFC are the possible owners of all the FCIs within that WSFC. One such scenario is when you have a FCI + AG solution (AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution by Using Failover Cluster Instances and Availability Groups), where multiple FCIs reside within a single WSFC, and an availability group is created across the FCIs, as shown in the Figure 1 below:
Figure 1: An FCI+AG solution across two data centers (FCI being used for HA and AG being used for DR)
In this deployment,
High level steps to build such a solution are:
After building the primary site failover cluster instance (Step 1 above, shown in Figure 2 below), you will see the possible owners of the resources for the failover cluster instance resource group SMFCI601\SM_FCI_601 as shown in the Figure 3.
Figure 2: Topology after Completing Step 1 (2-node FCI created at the primary data center)
Figure 3: Possible owners of the FCI network name resource after Completing Step 1 (2-node FCI created at the primary data center)
This is correct, and as expected.
During Step 2, the two nodes from the DR site (R900-03 and R900-04) are first added to the WSFC (Figure 4). Once these two nodes are added to the WSFC, you will see that the newly added nodes have been unexpectedly added as possible owners (Figure 5) for all the resources of the previously existing FCI (SMFCI601\SM_FCI_601).
Figure 4: Two nodes from the DR site added to the WSFC
Figure 5: Possible owners list of the FCI network name resource of the primary FCI, after adding two nodes from the DR site to the WSFC
This is neither intended nor desired. R900-03 and R900-04 can never be the possible owners of any resources of SMFCI601\SM_FCI_601. So, why does this happen? It is the way SQL Server FCI setup works with Windows clustering. When a node is added to the WSFC, the node is added as a possible owner for all existing FCIs. This works well in some scenarios, but doesn’t work well in the scenario we are discussing in this article.
The possible owners list must be manually modified each time you add a node to the WSFC. Figure 6 shows corrected possible owners list for one resource (the FCI network name on the primary).
Figure 6: Possible owners list of the FCI network name resource of the primary FCI, after manually correcting the list
Repeat this for all resources (other than disks, the disk resources don’t demonstrate this behavior), such as:
And repeat this for each FCI in your topology. Note that the possible owners list for the resources in each FCI will differ from other FCIs.
The possible owners list must be appropriately modified before proceeding to create the availability group across the FCIs (Step 3 listed above), otherwise AG creation will fail, because there will be overlap between the possible owners of the two instances hosting the AG replicas.
This should not be confused with the possible owners list for the availability group resource. You should not alter the possible owners list for the availability group (DO NOT use Windows Failover Cluster Manager to perform Availability Group Failover).
How should the cluster be configured with 3 nodes?
I have a pair of SQL servers in a cluster in one datacentre and only 1 server in a second datacentre which I want to take over if the primary site goes down (manual failover)
Because I have 3 nodes if I take down the network between the datacentres and then take one of the primary servers down I lose the whole cluster. I want to do 2 node + quorum in the SQL cluster and for it only to fail if a server and/or the SAN goes down.
Please refer to the two architecture guides:
My current cluster set up is like this -
Its a 5 node windows 2008 R2 failover cluster.
Node 1 has a SQL instance 1 set to failover to Node 2. Node 3 has an instance 2 set to falover only node 5.
I think i have exatcly same situation that you have mentioned. But i am not using Always On feature. My instances are all on SQL 2008 R2. Recently we added 2 new nodes to the windows cluster. Now i see that the resource group, ip address in the resource group, Disk drives, SQL Server and SQL Agent in other respurces section shows the possible owners as Node1,Node2,Node3 and Node 5. This is for instance 1. Instance 2 is good as it was installed after we added the new nodes to the windows cluster.
Should i follow the same steps by manually removing the possible owners from instance 1?