Category Archives: ESXi

Posts about VMware ESXi

Creating vSAN cluster with over 32 hosts

So I was building out a 44 node vSAN cluster last week and I ran into an issue where 12 of the ESXi hosts had their own network Partition group different than the other 32.  I had no issues with the vSAN network, I was able to use vmkping every server so there was no communication issue with any of the hosts in the cluster  via the vSAN kernel.  In most cases vSAN Network partitioning occurs when there was issue with vSAN kernel communicating with other hosts.

After several attempts of removing the diskgroups, removing vSAN kernel, moving out of cluster and away from DVS then back I had no luck.  I knew based on VMware supported Maximum that I could create a 64 node vSAN cluster.  I was at a loss so after several hours troubleshooting and Google searching I ended up opening a SR with VMware after about hour or so troubleshooting the VMware engineer was at loss until he found an article that indicates that you must create some advanced settings on the ESXi hosts in order to see above 32 nodes and once we made those settings rebooted the hosts we had a single network partition our issue was resolved.

The KB article (2110081) shows how to perform task via esxcli via SSH logged into root but does not show how to do it via PowerCli.

$vcenter = Read-Host "Enter Vcenter connecting to"

Connect-VIserver $vcenter

$cluster = Read-host "Cluster Name"

foreach ($host in (Get-Vmhost -Location $cluster)){
Get-VMhost $host | Get-AdvancedSetting -Name "VSAN.goto11" | Set-AdvancedSetting -value 1 -confirm:$false
Get-VMhost $host | Get-AdvancedSetting -Name "Net.TcpipHeapMax" | Set-AdvancedSetting -value 1536 -confirm:$false
}

Then reboot your hosts.

That’s it!

You would think that supported maximums would occur out of box but according to VMware they did not want smaller vSAN cluster to sacrifice memory overhead that would be required for larger vSAN clusters to run efficiently.

Rolling Reboot of VMware ESXi Cluster

Ran into situation where I needed to reboot a full cluster of ESXi hosts. In most cases when I need to reboot Cluster full of hosts I would utilize VUM (VMware Update Manager) to use VMware DRS to move VMs off the host, place the host in maintenance mode, reboot host and when host completes the reboot take server back out of maintenance mode, then move to next host and does for each host in cluster.

I did not need to patch the hosts this time. And Since the cluster had 32 hosts and several VMs I did not want to do this by hand. So used google and was found this script that I wanted to share. I wish I could give credit to the creator but the was on in an archived word press blog.

The script does the following:
Goes through the cluster one host at a time and puts ESXi server maintenance mode, reboots the server and the puts it back online. If VMs are running on the host DRS will need to be enabled in fully automated mode to allow VMs to VMotion off to other hosts (There should also be enough HA capacity in cluster to have 1 host taken offline at a time.

###################
## reboot-vmcluster.ps1 
## Supply the hostname/FQDN for you vcenter server and the name of the cluster you want rebooted
## Script reboots each ESXi server in the cluster one at a time
###################
##################
## Args
##################
# Check to make sure an argument was passed
if ($args.count -ne 2) {
Write-Host “Usage: reboot-vmcluster.ps1 ”
exit
}

# Set vCenter and Cluster name from Arg
$vCenterServer = $args[0]
$ClusterName = $args[1]

##################
## Connect to infrastructure
##################
Connect-VIServer -Server $vCenterServer | Out-Null

##################
## Get Server Objects from the cluster
##################
# Get VMware Server Object based on name passed as arg
$ESXiServers = @(get-cluster $ClusterName | get-vmhost)

##################
## Reboot ESXi Server Function
## Puts an ESXI server in maintenance mode, reboots the server and the puts it back online
## Requires fully automated DRS and enough HA capacity to take a host off line
##################
Function RebootESXiServer ($CurrentServer) {
# Get Server name
$ServerName = $CurrentServer.Name

# Put server in maintenance mode
Write-Host “#### Rebooting $ServerName ####”
Write-Host “Entering Maintenance Mode”
Set-VMhost $CurrentServer -State maintenance -Evacuate | Out-Null

$ServerState = (get-vmhost $ServerName).ConnectionState
if ($ServerState -ne “Maintenance”)
{
Write-Host “Server did not enter maintanenace mode. Cancelling remaining servers”
Disconnect-VIServer -Server $vCenterServer -Confirm:$False
Exit
}
Write-Host “$ServerName is in Maintenance Mode”

# Reboot blade
Write-Host “Rebooting”
Restart-VMHost $CurrentServer -confirm:$false | Out-Null

# Wait for Server to show as down
do {
sleep 15
$ServerState = (get-vmhost $ServerName).ConnectionState
}
while ($ServerState -ne “NotResponding”)
Write-Host “$ServerName is Down”

$j=1
# Wait for server to reboot
do {
sleep 120
$ServerState = (get-vmhost $ServerName).ConnectionState
Write-Host “… Waiting for reboot”
$j++
}
while ($ServerState -ne “Maintenance”)
$RebootTime=$j/2
Write-Host “$ServerName is back up. Took $RebootTime minutes”

# Exit maintenance mode
Write-Host “Exiting Maintenance mode”
Set-VMhost $CurrentServer -State Connected | Out-Null
Write-Host “#### Reboot Complete####”
Write-Host “”
}

##################
## MAIN
##################
foreach ($ESXiServer in $ESXiServers) {
RebootESXiServer ($ESXiServer)
}

##################
## Cleanup
##################
# Close vCenter connection
Disconnect-VIServer -Server $vCenterServer -Confirm:$False

Example of Script Output:

>.\reboot-vmcluster.ps1 vcenter.domain.com demo-cluster
#### Rebooting esxi06.domain.com ####
Entering Maintenance Mode
Rebooting
esxi06.domain.com is Down
Waiting for Reboot ...
Waiting for Reboot ...
Waiting for Reboot ...
esxi06.domain.com is back up
Exiting Maintenance mode
#### Reboot Complete####

#### Rebooting esxi05.domain.com ####
Entering Maintenance Mode
Rebooting
esxi05.domain.com is Down
Waiting for Reboot ...
Waiting for Reboot ...
Waiting for Reboot ...
Waiting for Reboot ...
esxi05.domain.com is back up
Exiting Maintenance mode
#### Reboot Complete####