Finding the real NFS network

I was recently helping a customer who had inherited an existing vSphere deployment that used NFS storage. They were tasked with migrating the old VMs to newer infrastructure, but they first wanted to find the NFS storage array backing the datastores.

Looking at the datastores in vCenter, they couldn’t find the hostname/IP address of the storage target. Instead, they saw somewhat random values in the device backing > server field, sort of like this demo red datastore I mocked up that shows a server of network.nfs.1 where we’d normally expect to see the hostname or IP. The value observed, network.nfs.1 in this case, wasn’t a name that was resolvable using the customers DNS.

Looking at host networking, one VMkernel adapter was clearly the one used for storage access, similar to this mocked up screenshot:

It seemed logical that network.nfs.1 and the other seemingly random names were devices on this 10.3.3.0/25 network. We wanted to try and issue a ping from this ESXi host, but the root password was unknown and we were not able to login to the console to do so. However, since we had access to vCenter, I went looking for a way to send a ping from esxcli, hoping we could then use the Get-EsxCli PowerCLI cmdlet to issue our pings. I found esxcli network diag ping and testing in a lab worked as expected, so we tried it in this environment:

$esxcli = Get-EsxCli -VMHost $thisVmHost -v2
$networkDiagPing = $esxcli.network.diag.ping.CreateArgs()
$networkDiagPing.host = 'network.nfs.1'
$networkDiagPing.interface = 'vmk1'
$pingResults = $esxcli.network.diag.ping.Invoke($networkDiagPing)

Unfortunately, this resulted in the error sendto() failed (Network is unreachable). Surprising, as the NFS datastore was online and we had specified that we wanted to use the VMkernel interface on the storage network. In this case, the host had 4 VMkernel interfaces, so we stepped through each, trying to find out if the storage traffic was using a different interface. The last interface we tried, vmk0, received a response.

As best we could tell, the vmk1 interface was unused. The portgroup named ‘storage’ had a VLAN backing that didn’t actually exist in the environment & the VMkernel IP address wasn’t a network that existed either. Once we knew which network adapter was actually in use, the ping response returned an IP address of a known NAS. We did a bit more digging and found host entries that were obfuscating the actual IP addresses of known storage targets. For reference, here is how we found the host file entries, again using esxcli.

$esxcli.network.ip.hosts.list.invoke() | Select-Object HostName, IPaddress

HostName      IPaddress
--------      ---------
network.nfs.2 192.168.10.26
network.nfs.1 192.168.10.26
network.nfs.9 192.168.67.21

After the fact I put together a quick script to help in the odd event I ever see something like this again. It finds all the unique hostnames/IPs used by NFS datastores and then for each VMkernel interface attempts to ping the NFS host, only showing the successful ping responses.

$thisVmHost = 'h197-vesx-04.lab.enterpriseadmins.org'
foreach ($thisDatastoreBacking in Get-vmhost $thisVmHost | get-datastore |?{$_.ExtensionData.info.nas.type -eq 'NFS'} | select-object @{N='RemoteHostNames';E={$_.ExtensionData.info.nas.RemoteHostNames}} -Unique) {
  foreach ($thisVmk in Get-VMHostNetworkAdapter -VMHost $thisVmHost -VMKernel) {
    $esxcli = Get-EsxCli -VMHost $thisVmHost -v2
    $networkDiagPing = $esxcli.network.diag.ping.CreateArgs()
    $networkDiagPing.host = $thisDatastoreBacking.RemoteHostNames
    $networkDiagPing.interface = $thisVmk.name
    try {$pingResults = $esxcli.network.diag.ping.Invoke($networkDiagPing); $uniqueHosts = [string]::Join(', ', ($pingResults.Trace.host | select-object -Unique))} catch { $pingResults=$null }
    
    if ($pingResults) { "Pinging $($thisDatastoreBacking.RemoteHostNames) from $($thisVmk.name) [IP $($thisVmk.IP)] took path $uniqueHosts" }
  } # end vmkernel loop
} # end Datastore backing loop

In a lab with a similar configuration, the script above produces output similar to:

Pinging network.nfs.1 from vmk0 [IP 192.168.10.19] took path 192.168.10.26
Pinging network.nfs.2 from vmk0 [IP 192.168.10.19] took path 192.168.10.26
Pinging network.nfs.9 from vmk0 [IP 192.168.10.19] took path 192.168.57.21, 192.168.10.1, 192.168.127.252

The final row in that output shows an NFS target that was not on the local network and took a few hops to get to the final destination, which might be helpful to see.

This entry was posted in Scripting, Virtualization. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.