Most of the time, our customers will let a VxRail upgrade progress as detailed in this post or to 4.7 in this post. While that will work for 95% of our customers, there are some use cases where the customer has some specific requirements for VMs to remain on a single node (and possibly be the first to power up). During an upgrade, VxRail Manager will send the maintenance mode request to vCenter and most of the time, the VMs will vacate that host and then some will come back, thanks to DRS (this is now also true for vSphere Standard customers – blog will come on that as well).
In the last 2 months, I’ve been working with some Dell EMC account teams and customers for some other configurations as well. In 1 instance, the customer was using some software defined networking (not NSX) and it required 1 VM per VxRail node to control the network traffic. In this scenario, the system also required that VM to be the last VM to be on the host, and the first VM to be back. The other scenario, the VM could not be vMotioned off the host at all, they would much rather power down the VM, and take the outage during a maintenance window.
Through working with the teams, we basically came up with 2 ways to do this – Host Affinity and turn off DRS on the cluster. Both solutions will have similar outcomes, but each has its own nuances.
Setting Host Affinity Rules
This option was fairly easy to setup. In the lab, I basically created 3 VMs (1 for each of my VxRail Nodes) and then created host groups and host rules to make each VM run on a single node.
In the rules, I defined that the VMs named “Node#-PinnedVM” were set to run on a specific host, where # was the host number. This was simple to set up and will be persistent on this cluster for future upgrades.
The next step was to initiate a cluster upgrade. For the lab, this system was already running 4.7.000 code, so I’ll upgrade it to 4.7.001 code. The first I did was use the “local upgrade” functionality to upload the code and verify which components were going to be upgraded.
After the system completed the upgrades to the VxRail Manager VM and the VCSA (I deployed the local vCenter instance), it moved to node 1 in my VxRail cluster. The first step is that it’ll upgrade all the various HW components with updated firmware and drivers (as well as BIOS and update iDRAC, if necessary). After the system has updated the components it can with the node online, it’ll send the request to put the node into maintenance mode. You will see the request in both VxRail Manager and well as vCenter.
The system will sit there and not enter maintenance mode without manual intervention, due to our VM affinity rules that were set. VxRail Manager will also wait for the node to go into maintenance mode for 60 minutes (3,600 seconds) before finally canceling the request and failing the upgrade.
As you see above, the upgrade failed, but we do have the ability to retry the upgrade. In order for the upgrade to succeed, we need to do something with the VM named “Node1-PinnedVM” to allow that VxRail node to enter maintenance mode. In this case, the only thing I could do was power off that VM. Once that was done, the node was able to go into maintenance mode and the upgrade was able to move on, completing the upgrade on Node1. The same thing would happen for node 2 (as you can see in the screenshot, there’s a request for node 2 to enter maintenance mode.
All in all, this is a viable solution for facilitating the upgrade of an environment where VM affinity rules are in place. This solution will work for one fo the customer as they have a 12 node cluster and only the last 3 nodes are impacted by their requirement to keep a specific VM on a specific VxRail Node.
This is a fairly easy way to complete this task, but you end up with either having to monitor the system closely or waiting 60 minutes for the maintenance mode request to time out. In part 2, I’ll look to see how we can use DRS setting to achieve a similar result.