Impact of Adding a Node to a Windows Cluster on the Possible Owners property of existing SQL Server Failover Cluster Instances

Rate This
  • Comments 3

Author: Sanjay Mishra

Contributor: David P Smith (ServiceU)

Technical Reviewers: Charles (Curt) Mathews, James Podgorski, Eric Kang, Prem Mehra, Min He

 

It is common to have more than one SQL Server Failover Cluster Instance (FCI) within a single Windows Server Failover Cluster (WSFC). It is also fairly common that not all nodes in the WSFC are the possible owners of all the FCIs within that WSFC. One such scenario is when you have a FCI + AG solution (AlwaysOn Architecture Guide: Building a High Availability and Disaster Recovery Solution by Using Failover Cluster Instances and Availability Groups), where multiple FCIs reside within a single WSFC, and an availability group is created across the FCIs, as shown in the Figure 1 below:

image

Figure 1: An FCI+AG solution across two data centers (FCI being used for HA and AG being used for DR)

In this deployment,

  • Possible owners of SMFCI601\SM_FCI_601 = {PE2970-01, PE2970-02}.
  • Possible owners of SMFCI602\SM_FCI_602 = {R900-03, R900-04}.

High level steps to build such a solution are:

  1. Step 1: Build the primary site failover cluster instance SMFCI601\SM_FCI_601 on nodes PE2970-01 and PE2970-02.
  2. Step 2: Add the DR site nodes R900-03 and R900-04, to the same Windows cluster, and then create the secondary failover cluster instance SMFCI602\SM_FCI_602 on R900-03 and R900-04.
  3. Step 3: Create the AG across the two instances SMFCI601\SM_FCI_601 and SMFCI602\SM_FCI_602.

After building the primary site failover cluster instance (Step 1 above, shown in Figure 2 below), you will see the possible owners of the resources for the failover cluster instance resource group SMFCI601\SM_FCI_601 as shown in the Figure 3.

image

Figure 2: Topology after Completing Step 1 (2-node FCI created at the primary data center)

image

Figure 3: Possible owners of the FCI network name resource after Completing Step 1 (2-node FCI created at the primary data center)

This is correct, and as expected.

During Step 2, the two nodes from the DR site (R900-03 and R900-04) are first added to the WSFC (Figure 4). Once these two nodes are added to the WSFC, you will see that the newly added nodes have been unexpectedly added as possible owners (Figure 5) for all the resources of the previously existing FCI (SMFCI601\SM_FCI_601).

image

Figure 4: Two nodes from the DR site added to the WSFC

image

Figure 5: Possible owners list of the FCI network name resource of the primary FCI, after adding two nodes from the DR site to the WSFC

This is neither intended nor desired. R900-03 and R900-04 can never be the possible owners of any resources of SMFCI601\SM_FCI_601. So, why does this happen? It is the way SQL Server FCI setup works with Windows clustering. When a node is added to the WSFC, the node is added as a possible owner for all existing FCIs. This works well in some scenarios, but doesn’t work well in the scenario we are discussing in this article.

The possible owners list must be manually modified each time you add a node to the WSFC. Figure 6 shows corrected possible owners list for one resource (the FCI network name on the primary).

image

Figure 6: Possible owners list of the FCI network name resource of the primary FCI, after manually correcting the list

Repeat this for all resources (other than disks, the disk resources don’t demonstrate this behavior), such as:

  • FCI network name
  • FCI IP address
  • SQL instance
  • SQL Agent
  • Any other resources part of the resource group must be checked for correct possible owners.

And repeat this for each FCI in your topology. Note that the possible owners list for the resources in each FCI will differ from other FCIs.

The possible owners list must be appropriately modified before proceeding to create the availability group across the FCIs (Step 3 listed above), otherwise AG creation will fail, because there will be overlap between the possible owners of the two instances hosting the AG replicas.

This should not be confused with the possible owners list for the availability group resource. You should not alter the possible owners list for the availability group (DO NOT use Windows Failover Cluster Manager to perform Availability Group Failover).

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • Post
  • How should the cluster be configured with 3 nodes?

    I have a pair of SQL servers in a cluster in one datacentre and only 1 server in a second datacentre which I want to take over if the primary site goes down (manual failover)

    Because I have 3 nodes if I take down the network between the datacentres and then take one of the primary servers down I lose the whole cluster.  I want to do 2 node + quorum in the SQL cluster and for it only to fail if a server and/or the SAN goes down.

  • @Darren,

    Please refer to the two architecture guides:

    - sqlcat.com/.../alwayson-architecture-guide-building-a-high-availability-and-disaster-recovery-solution-by-using-failover-cluster-instances-and-availability-groups.aspx

    - sqlcat.com/.../alwayson-architecture-guide-building-a-high-availability-and-disaster-recovery-solution-by-using-alwayson-availability-groups.aspx

    for details.

  • Hi Sanjay,

    My current cluster set up is like this -

    Its a 5 node windows 2008 R2 failover cluster.

    Node 1 has a SQL instance 1 set to failover to Node 2. Node 3 has an instance 2 set to falover only node 5.

    I think i have exatcly same situation that you have mentioned. But i am not using Always On feature. My instances are all on SQL 2008 R2. Recently we added 2 new nodes to the windows cluster. Now i see that the resource group, ip address in the resource group, Disk drives, SQL Server and SQL Agent in other respurces section shows the possible owners as Node1,Node2,Node3 and Node 5. This is for instance 1. Instance 2 is good as it was installed after we added the new nodes to the windows cluster.

    Should i follow the same steps by manually removing the possible owners from instance 1?

Page 1 of 1 (3 items)