One of the benefits customers see when choosing a HCI solution is that it allows them to start with what is needed for the next few months versus the next few years. One of the main drivers to this acquisition model is te fact that you can easily scale out the system.
Every time you add a VxRail node, you’re adding not only storage, but also CPU and RAM to the overall cluster. Also, with the VxRail, after the first 3 nodes, you can choose different node types with different configurations. With that being said, just because you can do that, should you do it? There are definitely some things to be considerate of.
The biggest thing to consider when adding a new node to the cluster is around the impact of not only adding the node (everyone likes to gain performance), but also the impact of failure scenarios. If a customer has a cluster that has 4 nodes and each node has dual 16 core processors and 786GB RAM, I would be hesitant to expand out that cluster with a new node that’s maxed out with dual 28 core processors and 3TB of RAM (yes, we can do that in a single node…even in a 1U form factor!). While this may seem appealing to beef up the resources, what happens if that node goes into maintenance mode, or worse, fails? You could end up in a scenario where the cluster is out of CPU and/or RAM and VMs won’t be able to run on the remaining nodes during a maintenance event.
The good news is that scaling out a VxRail cluster is quite simple. Much like a VxRail install, this does require a PS engagement today. In order to make sure the environment is prepared for the new node, the TOR switches should be configured prior to the node add (make sure the same VLANs are trunked to the ports). Also, make sure the forward and reverse DNS lookup zones are created (remember, VxRail will use the next set of IP addresses and host names in the pool).
Before adding the node, you should double-check the environment. In my lab, I’m using one of the G Series appliances. During the install of this cluster, which is documented here, I only deployed 3 of the 4 nodes in the chassis. You can see in both the physical and logical view, VxRail Manager is displaying only 3 nodes.
Once the node has been racked, cabled, and booted up, it will appear within VxRail Manager. Over the last 2.5 years, the developers have done a tremendous amount of work making Day 0, and Day 1 tasks easier and more parallel. One of those things is the ability to add more than 1 node at a time (back in the day, we did have to scale out one node at a time). Today we support adding up to 6 nodes at once, and it works great! As you can see in the screen, you can choose the node (or nodes, if I had more than 1) that you want to add to the cluster.
After choosing the node(s) to add, the system will prompt for vCenter credentials and then provide the IP range information details used during the install. If there are enough free IPs in the range, the system will use the next free IP in the range. If the IPs have all been consumed, the new IPs can be entered either one by one or by providing a range. Note, the 2nd range of IPs do not have to be contiguous with the first range provided during the install, but they do need to all be on the same subnet.
Before a node is added, a screen will appear prompting for the root and management credentials – this is to set the passwords for those accounts, so you may want to make sure you follow any standards from that side. The process also can’t move to the validation phases without checking the box acknowledging that the forward and reverse lookup zone have been created in DNS (this is a must).
The system will then go through a validation, similar to what is performed during an install, just note that it’s not as long, but it will take a few minutes to complete. If something was misconfigured, an error would be displayed and once the issue is resolved, the node add can be restarted.
Once the validation process succeeds, the node add process can be started. The VxRail Manager interface will display the steps being executed. One thing to note, in the 1st screen shot (top left), the system mentions that it’s destroying VMs, disabling vSAN and deleting disks, while true, that is being executed on the new node, not the cluster. Every node ships from the factory with the same code on it as manufacturing has no idea if a node is for a net new cluster or part of an expansion purchase. It can definitely be a little worrisome the first time you see it though.
Just like a new VxRail install, the system automates the entire process from adding the node(s) into vCenter, configuring the new node(s) with the proper VDS config, and creating the vSAN Disk group(s). Also, during this process, the system puts the cluster monitoring into a “muted” state so that it doesn’t flood the event view in VxRail Manager with alerts.
Some of the last steps in the process is to update the VxRail Manager views. As you can see in the screenshots above, the cluster now show 4 nodes in both the physical and logical view within the VxRail Manager interface.
All in all, the process is fairly simple and straightforward. This helps with the previous discussion around buying what’s needed for the near term and then scaling out (or even up) as needed. In my experience the longest part of the node add process is waiting for the system to boot.