Reviving an Old ESXi Host: USB to Local Disk Migration

I have an older Intel NUC in my lab, and although its aging, it still serves a purpose, and I plan to hang on to it for a little while longer. This post will outline some issues I encountered while recently migrating from a USB boot device to a more permanent option. As described extensively in this knowledge base article: https://knowledge.broadcom.com/external/article/317631/sd-cardusb-boot-device-revised-guidance.html, using USB devices is no longer a recommended boot media due to the endurance of the media. In addition, this host recently started throwing an error message:

Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0. As a result, host configuration changes will not be saved to persistent storage.

This message appeared on the host object in vCenter Server. I decided this would be a good time to move the boot device to a more durable media. The host had a local disk, which contained a single VMFS volume where I stored a VM containing some backups. I moved this VM to a shared datastore for safe keeping and proceeded to delete the VMFS volume. I wanted to re-install ESXi and specify this device, and not having a VMFS volume makes me more confident when selecting the disk during the ESXi install.

Creating the boot media

For this ESXi host, I knew that I would need the latest ESXi base image (8.0u3d), the Synology NFS Plug-in for VAAI, and the USB NIC Fling Driver. Instead of just installing ESXi and then adding packages, or using New-ImageBundle, I decided to turn to vCenter Server Lifecycle Manager for help. I first created a new, empty cluster object. I then selected the updates tab and selected ‘manage with a single image’, and then ‘setup image manually’. I selected the required ESXi version and additional components, save, and finally ‘finish image setup’. Once complete, I was able to select the ‘…’ and ‘Export’ options, pictured below.

This allowed me to export the image as an ISO image, pictured below:

With the ISO image in hand, I used Rufus to write the ISO image to a USB drive to use as the installation media.

Installing ESXi

Since I only needed to install ESXi on a single host, I decided to do so manually / interactively. Knowing that this was an old host, and the installed CPU was no longer supported on the HCL, I pressed SHIFT+o (the letter o, not the number zero) during bootup to add a couple of boot options:

systemMediaSize=min allowLegacyCPU=true

The systemMediaSize option limits the amount of space used on the boot media to 32GB (min) instead of 128GB (default). This is described more here: https://knowledge.broadcom.com/external/article/345195/boot-option-to-configure-the-size-of-esx.html. The allowLegacyCPU option allows ESXi installs to continue on unsupported CPUs. This is documented various places, including here: https://williamlam.com/2022/10/quick-tip-automating-esxi-8-0-install-using-allowlegacycputrue.html.

The install went well, I was able to select my empty local disk to use as an installation target and the system booted up fine afterwards. I noticed I now had a datastore1 on this host which was 32GB smaller than the original VMFS volume.

Configuring USB NIC Fling Driver

My USB NIC was recognized immediately as well, since I had included the driver in the custom image. I added the host to a distributed virtual switch, and mapped uplinks to the appropriate physical NICs, but on reboot the vusb0 device was no longer in use by Uplink 2.

Some of my notes mentioned that I had previously added some lines to the /etc/rrc.local.d/local.sh script to handle this, although I didn’t list which commands. Thankfully I was able to get the system to boot from the failing USB device and review this file. I’ve included the code added below:

vusb0_status=$(esxcli network nic get -n vusb0 | grep 'Link Status' | awk '{print $NF}')
count=0
while [[ $count -lt 20 && "${vusb0_status}" != "Up" ]]
do
    sleep 10
    count=$(( $count + 1 ))
    vusb0_status=$(esxcli network nic get -n vusb0 | grep 'Link Status' | awk '{print $NF}')
done

esxcfg-vswitch -P vusb0 -V 308 30-Greenfield-DVS
/bin/vim-cmd internalsvc/refresh_network

The esxcfg-vswitch help states that the -P and -V options are used as follows:

 -V|--dvp=dvport             Specify a DVPort Id for the operation.
 -P|--add-dvp-uplink=uplink  Add an uplink to a DVPort on a DVSwitch.
                              Must specify DVPort Id.

The Physical uplink I wanted to add was vusb0 and the DV Port Id for the operation was 308, which could be found on the distributed switch > ports tab > when filtering he ‘connectee’ column for the specific host in question, pictured below:

Now on system reboot, the vusb0 uplink correctly connects to the expected distributed switch.

Lifecycle Manager – Host is not compatible with the image

Once I had the host networking situated, I wanted to verify that vCenter Lifecycle Manager agreed that my host was up to date with the latest image. I was surprised to see that the system said The CPU on the host is not supported by the image. Please refer to KB 82794 for more details.

I knew these CPUs were unsupported, but had expected a less significant The CPU on this host may not be supported in future ESXi releases, which is what I had observed prior to the host rebuild. After some searching, I found this thread: https://community.broadcom.com/vmware-cloud-foundation/discussion/syntax-for-an-upgrade-cmd-to-ignore-cpu-requirements, which proposed edits to the /bootbank/boot.cfg, specifically adding the allowLegacyCPU=true flag to be added to the end of the kernelopt= line. This resolved my issue and allows me to continue functioning with this older system.

Conclusion

This migration process highlights the challenges of maintaining older ESXi hosts while ensuring compatibility. Moving from USB-based boot devices to more durable storage is a critical step, especially as support is phased out for USB/SD boot devices. Leveraging vCenter Lifecycle Manager simplifies image management, though workarounds (such as allowLegacyCPU=true) may be needed for legacy hardware.

This entry was posted in Lab Infrastructure, Virtualization. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.