Reviving an Old ESXi Host: USB to Local Disk Migration

I have an older Intel NUC in my lab, and although its aging, it still serves a purpose, and I plan to hang on to it for a little while longer. This post will outline some issues I encountered while recently migrating from a USB boot device to a more permanent option. As described extensively in this knowledge base article: https://knowledge.broadcom.com/external/article/317631/sd-cardusb-boot-device-revised-guidance.html, using USB devices is no longer a recommended boot media due to the endurance of the media. In addition, this host recently started throwing an error message:

Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disks/mpx.vmhba32:C0:T0:L0. As a result, host configuration changes will not be saved to persistent storage.

This message appeared on the host object in vCenter Server. I decided this would be a good time to move the boot device to a more durable media. The host had a local disk, which contained a single VMFS volume where I stored a VM containing some backups. I moved this VM to a shared datastore for safe keeping and proceeded to delete the VMFS volume. I wanted to re-install ESXi and specify this device, and not having a VMFS volume makes me more confident when selecting the disk during the ESXi install.

Creating the boot media

For this ESXi host, I knew that I would need the latest ESXi base image (8.0u3d), the Synology NFS Plug-in for VAAI, and the USB NIC Fling Driver. Instead of just installing ESXi and then adding packages, or using New-ImageBundle, I decided to turn to vCenter Server Lifecycle Manager for help. I first created a new, empty cluster object. I then selected the updates tab and selected ‘manage with a single image’, and then ‘setup image manually’. I selected the required ESXi version and additional components, save, and finally ‘finish image setup’. Once complete, I was able to select the ‘…’ and ‘Export’ options, pictured below.

This allowed me to export the image as an ISO image, pictured below:

With the ISO image in hand, I used Rufus to write the ISO image to a USB drive to use as the installation media.

Installing ESXi

Since I only needed to install ESXi on a single host, I decided to do so manually / interactively. Knowing that this was an old host, and the installed CPU was no longer supported on the HCL, I pressed SHIFT+o (the letter o, not the number zero) during bootup to add a couple of boot options:

systemMediaSize=min allowLegacyCPU=true

The systemMediaSize option limits the amount of space used on the boot media to 32GB (min) instead of 128GB (default). This is described more here: https://knowledge.broadcom.com/external/article/345195/boot-option-to-configure-the-size-of-esx.html. The allowLegacyCPU option allows ESXi installs to continue on unsupported CPUs. This is documented various places, including here: https://williamlam.com/2022/10/quick-tip-automating-esxi-8-0-install-using-allowlegacycputrue.html.

The install went well, I was able to select my empty local disk to use as an installation target and the system booted up fine afterwards. I noticed I now had a datastore1 on this host which was 32GB smaller than the original VMFS volume.

Configuring USB NIC Fling Driver

My USB NIC was recognized immediately as well, since I had included the driver in the custom image. I added the host to a distributed virtual switch, and mapped uplinks to the appropriate physical NICs, but on reboot the vusb0 device was no longer in use by Uplink 2.

Some of my notes mentioned that I had previously added some lines to the /etc/rrc.local.d/local.sh script to handle this, although I didn’t list which commands. Thankfully I was able to get the system to boot from the failing USB device and review this file. I’ve included the code added below:

vusb0_status=$(esxcli network nic get -n vusb0 | grep 'Link Status' | awk '{print $NF}')
count=0
while [[ $count -lt 20 && "${vusb0_status}" != "Up" ]]
do
    sleep 10
    count=$(( $count + 1 ))
    vusb0_status=$(esxcli network nic get -n vusb0 | grep 'Link Status' | awk '{print $NF}')
done

esxcfg-vswitch -P vusb0 -V 308 30-Greenfield-DVS
/bin/vim-cmd internalsvc/refresh_network

The esxcfg-vswitch help states that the -P and -V options are used as follows:

 -V|--dvp=dvport             Specify a DVPort Id for the operation.
 -P|--add-dvp-uplink=uplink  Add an uplink to a DVPort on a DVSwitch.
                              Must specify DVPort Id.

The Physical uplink I wanted to add was vusb0 and the DV Port Id for the operation was 308, which could be found on the distributed switch > ports tab > when filtering he ‘connectee’ column for the specific host in question, pictured below:

Now on system reboot, the vusb0 uplink correctly connects to the expected distributed switch.

Lifecycle Manager – Host is not compatible with the image

Once I had the host networking situated, I wanted to verify that vCenter Lifecycle Manager agreed that my host was up to date with the latest image. I was surprised to see that the system said The CPU on the host is not supported by the image. Please refer to KB 82794 for more details.

I knew these CPUs were unsupported, but had expected a less significant The CPU on this host may not be supported in future ESXi releases, which is what I had observed prior to the host rebuild. After some searching, I found this thread: https://community.broadcom.com/vmware-cloud-foundation/discussion/syntax-for-an-upgrade-cmd-to-ignore-cpu-requirements, which proposed edits to the /bootbank/boot.cfg, specifically adding the allowLegacyCPU=true flag to be added to the end of the kernelopt= line. This resolved my issue and allows me to continue functioning with this older system.

Conclusion

This migration process highlights the challenges of maintaining older ESXi hosts while ensuring compatibility. Moving from USB-based boot devices to more durable storage is a critical step, especially as support is phased out for USB/SD boot devices. Leveraging vCenter Lifecycle Manager simplifies image management, though workarounds (such as allowLegacyCPU=true) may be needed for legacy hardware.

Posted in Lab Infrastructure, Virtualization | Leave a comment

vCenter Server 7 to 8 Upgrade: Avoiding Pitfalls with Skyline Health Diagnostics

I recently upgraded a vCenter Server instance in a lab from version 7 to 8. In this post I’ll walk through some issues encountered and lessons learned during the process.

Pre-Checks with Skyline Health Diagnostics

In Skyline Health Diagnostics, there is an analysis option for vCenter Upgrade Pre-Check Plugins. This can be found under New Analysis by selecting either product VMware vSphere or VMware vCenter Server. In the current version of Skyline Health Diagnostics (4.0.8 as of this post), this analysis will check a total of 32 possible issues that could impact a vCenter Server upgrade.

In my case, the scan identified a warning named ‘LookupServiceCheck.’ The recommendation was to use lsdoctor to remove some duplicate service registrations. The tools usage is described in this KB article: https://knowledge.broadcom.com/external/article/320837/using-the-lsdoctor-tool.html. The detailed output of Skyline Health Diagnostics also identified specifically which service registrations need to be validated.

After taking a snapshot of the VM, I followed the KB article to remove the duplicate registrations. I then re-ran the vCenter Upgrade Pre-Check Plugins scan, confirmed the issue was resolved, and verified that no new issues were found.

Upgrading vCenter Server

The upgrade from vCenter Server 7.0 to 8.0 was uneventful — which is exactly how I like to describe upgrades. A new vCenter Server Appliance was deployed, the configuration copied over, and logins to the new system, even using active directory users, worked without issue.

It’s difficult to know if the Skyline Health Diagnostics recommendation to run lsdoctor directly contributed to the smooth upgrade. However, the tool was easy to use, and I would definitely include it in my upgrade checklist for future upgrades.

Clearing the arp-cache

Looking in the inventory of the upgraded vCenter Server, I did notice that two test hosts were disconnected. While disconnected test hosts aren’t uncommon, I recalled both hosts being online prior to the upgrade. Interestingly I was able to ping these hosts from my jump box, but not from the vCenter Server appliance.

These hosts were connected to an older consumer grade-switch (TP Link T1600G-28TS) in my lab. I had previously encountered an issue where devices on this switch failed to recognize IP address changes, which I resolved by clearing the arp table. Given that the vCenter Server upgrade workflow involves assigning a temporary IP before reverting to the original, I suspected a similar issue.

I referenced my old notes, which instructed me to SSH into the switch and run a few commands. However, when I attempted to connect using: ssh admin@192.168.10.1, I was met with this error:

Unable to negotiate with 192.168.10.1 port 22: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1

A quick search helped me overcome that error. However, I encountered a new error, and then another error, until I found a working connection string:

ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 -oHostKeyAlgorithms=+ssh-dss -c aes256-cbc -m hmac-md5 admin@192.168.10.1

Once connected, I ran the following commands:

enable
configure
clear arp-cache

Immediately, the two disconnected hosts came back online. This was a reminder that aging network equipment can introduce unexpected issues. Newer operating systems and SSH implementations have stricter security requirements, and this switch hasn’t kept pace.

Conclusion

The best way to ensure a successful vCenter Server upgrade is preparation. Skyline Health Diagnostics provides an effective way to validate the environment before upgrading and offers guidance on any required remediation steps. Additionally, this experience reinforced the importance of keeping network hardware up to date—technical debt can surface at the most unexpected times, even during routine maintenance.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Automated Deployment of Storage Server

In this post, I’ll walk through the process of automating the deployment of a storage server using Aria Automation. Specifically, I’ll show how to add an option to deploy a storage server dynamically in your lab environment and troubleshoot an iSCSI duplicate LUN ID issue.

In a recent post, I documented the creation of a storage server that could provide NFS or iSCSI storage for use in my lab. To make consuming this appliance easier, I wanted to add a check box to my Aria Automation request form to determine if a storage server was needed. If this box was checked, then a storage server would be included with the deployment.

To achieve this, I added an additional vSphere.Machine object to my Assember Template and connected it to the same network as the nested ESXi host. I then added a boolean input to the template. Finally, the ‘count’ property of the machine object is set using the following logic:

      count: ${input.deployStorageServer == true ? 1:0 }

This logic says that if the box is checked (boolean true) then the count is 1, otherwise it is 0. Here is a screenshot of the canvas after adding the machine:

I now have a toggle to deploy (or not) an external storage appliance when deploying nested ESXi hosts.

Duplicate Device ID issue

After creating a couple of test deployments and adding hosts to vCenter Server, I ran into an interesting issue. For deployment #1, I formatted LUN0 as VMFS. For deployment #2, I tried to format LUN0 but it didn’t appear in the GUI. Looking a bit more closely at the devices, I realized that the unique identifiers (naa.* value) of each LUN was the same. This makes sense as each appliance was cloned with devices presented and all. Next, I attempted to unmap and remap the iSCSI device from the appliance. Below are the commands I used.

sudo targetcli

# Remove the iscsi LUN mapping
cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns
delete 0

# Remove the disk mapping & recreate
cd /backstores/fileio/
delete disk0
create disk0 /data/iscsi/disk0.img sparse=true write_back=false

# Recreate the iscsi LUN mapping
cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns
create /backstores/fileio/disk0

After doing this and rescanning the host, I confirmed that the unique identifier (naa ID) changed and I was able to format LUN0 as VMFS.

Longer term, I decided that I should not have the iSCSI mappings pre-created in my storage appliance. I removed the configuration (clearconfig confirm=True) from the appliance, and instead placed the following script as: /data/iscsi/setup.sh:

targetcli backstores/fileio create disk0 /data/iscsi/disk0.img sparse=true write_back=false
targetcli backstores/fileio/disk0 set attribute is_nonrot=1
targetcli backstores/fileio create disk1 /data/iscsi/disk1.img sparse=true write_back=false
targetcli backstores/fileio/disk1 set attribute is_nonrot=1
targetcli backstores/fileio create disk2 /data/iscsi/disk2.img sparse=true write_back=false
targetcli backstores/fileio/disk2 set attribute is_nonrot=1

targetcli /iscsi create iqn.2025-02.com.example.iscsi:target01 
targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns create /backstores/fileio/disk0
targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns create /backstores/fileio/disk1
targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns create /backstores/fileio/disk2

targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/acls create iqn.1998-01.com.vmware:h316-vesx-64.lab.enterpriseadmins.org:394284478:65

targetcli saveconfig

Now, after deploying the storage appliance, assuming I need an iSCSI target I can run sudo /data/iscsi/setup.sh and have new identifiers generated at runtime. This eliminates the identifier duplication and gets the automation ~80% of the way there.

With the steps outlined above, I was able to automate the deployment of storage servers and resolve the issue with duplicate LUN IDs. This process saves time and ensures each deployment is consistent. Going forward, I’ll continue automating aspects of my lab environment to increase efficiency.

Posted in Lab Infrastructure, Virtualization | Leave a comment

How to Set Up a Minimal NFS and iSCSI Storage Solution Using Ubuntu 24.04

In my lab, I often need different types of storage to test various scenarios. For example, just last week someone asked about using New-Datastore with a specific version of VMFS and I needed to quickly perform a quick syntax check. I’ve found that having a nested storage appliance, like a Open Filer or Free NAS available is helpful. However, these appliances offer way more features than I need and typically have higher resource requirements. Sometimes, setting up specific storage protocols like NFS or iSCSI is crucial for testing and development, but existing solutions can be overly complex or resource-heavy for lab environments. In this post I’ll outline how I solved this problem with a few utilities added to an existing Ubuntu 24.04 template.

Storage Protocols

With this project I wanted to have a single storage target that could provide both NFS and iSCSI storage protocols. For purposes of my lab, nearly all of my storage testing the ‘client’ system will be ESXi. This post will provide some output/examples in that context. For example, ESXi supports block storage (such as iSCSI) and file storage, specifically NFS3 and NFS4.1. Ideally, I want to provide all three of these options with this single appliance, so we’ll show examples of using the appliance in all three of those ways.

Setting Up the Test Appliance

I deployed an Ubuntu 24.04 VM, using the image/customization spec described here: https://enterpriseadmins.org/blog/scripting/ubuntu-24-04-packer-and-vcenter-server-customization-specifications/

The template VM has a single 50GB disk, so I added an additional 50GB disk to use as the backing for the storage server. We’ll format this disk as btrfs and mount it as /data. This will be covered in the following code block:

sudo mkdir /data
sudo mkfs.btrfs /dev/sdb
echo "/dev/sdb /data btrfs defaults 0 0" | sudo tee -a /etc/fstab
sudo systemctl daemon-reload
sudo mount /data

The above code block is creating a folder, formatting our second disk in the system, adding an entry to the fstab file so the filesystem mounts when the system boots, and finally mounts our new disk. After the above is complete, running df -h /data should return the mounted disk and its size, just as a confirmation that everything worked successfully.

Configuring NFS on Ubuntu 24.04

I’ll start with NFS as this is a problem I’ve previously solved for using a Photon OS (https://enterpriseadmins.org/blog/virtualization/vmware-workstation-lab-photon-os-container-host-and-nfs-server/). The only difference this time is that I planned to use Ubuntu 24.04, which has a slightly different package name for the NFS Server components.

sudo apt install nfs-kernel-server -y
sudo mkdir /data/nfs
echo "/data/nfs *(rw,async,no_root_squash,insecure_locks,sec=sys,no_subtree_check)" | sudo tee -a /etc/exports
sudo systemctl daemon-reload
sudo systemctl reload nfs-server

The above code block installs our NFS server package, creates a subfolder to export as NFS, adds an entry to the NFS exports configuration file, then reloads the configuration to take effect. On our client system (ESXi), we can confirm that our work was successful by creating a datastore. I’ll complete that task in PowerCLI below:

New-Datastore -VMHost h316-vesx-64* -Name 'nfs326' -Nfs -NfsHost 192.168.10.26 -Path /data/nfs -Confirm:$false

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
nfs326                                  14.994          15.000

As we can see, the test was a success and returned our mount point size in the command results. The above example resulted in an NFS 3 mount of the NFS folder. I created a subfolder (test41) and executed a similar test to confirm this could work for NFS 4.1 as well.

New-Datastore -VMHost h316-vesx-64* -Name 'nfs326-41' -Nfs -FileSystemVersion 4.1 -NfsHost 192.168.10.26 -Path /data/nfs/test41 -Confirm:$false

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
nfs326-41                               14.994          15.000

As we can see in the vCenter web interface, one of these datastores is NFS 3 and the other is NFS 4.1, both showing the same capacity and free space.

This confirms that we were able to successfully connect to our NFS Server service using ESXi with both NFS 3 and NFS 4.1 connections. Next, we’ll look at setting up iSCSI storage, which requires a slightly different approach.

iSCSI

The iSCSI testing was a bit more interesting. Looking around, I found a couple of ways to create an iSCSI target. I ended up using targetcli to create the iSCSI target. There are plenty of tutorials around for this, including a video: https://www.youtube.com/watch?v=OIpxwX6pTIU and Ubuntu 24.04 specific article: https://www.server-world.info/en/note?os=Ubuntu_24.04&p=iscsi&f=1 that were very helpful. I’ll document the steps below for completeness.

In this first code block we’ll install the service and create the folder where we’ll storage some images.

sudo apt install targetcli-fb -y
sudo mkdir /data/iscsi

The targetcli command can create image files to use as backing for our disks, but in my testing the sparse=true switch does not create sparse files, so we’ll do this as two steps. You’ll note that I’m specifying one image has having a 2T file size… but as you may have noticed in our NFS example, we only have 15GB of disk for our /data mount. This doesn’t result in some ‘magic beans’ sort of free storage… once we write 15GB of stuff to this disk we’ll be out of capacity and run into problems. This is only being done for illustration/simulation purposes…. sometimes you’ll want the UI to show ‘normal’ sizes that you’d see with actual datastores. One reason that the /data mount was configured with btrfs instead of something like ext4 is so we can support image files larger than 16T. This btrfs filesystem will allow files over 62TB in size (62TB being the maximum supported size for VMFS 6). Also in the code block output below, we’ll use du to show that these disks are using 0 bytes on the file system, but have larger apparent sizes.

sudo truncate -s 10G /data/iscsi/disk0.img
sudo truncate -s 10G /data/iscsi/disk1.img
sudo truncate -s 2T /data/iscsi/disk2.img

du -h /data/iscsi/*.img
0       /data/iscsi/disk0.img
0       /data/iscsi/disk1.img
0       /data/iscsi/disk2.img

du -h /data/iscsi/*.img --apparent-size
10G     /data/iscsi/disk0.img
10G     /data/iscsi/disk1.img
2.0T    /data/iscsi/disk2.img

Once we have our files pre-staged, we can start working with targetcli.

sudo targetcli

# this should enter the targetcli shell

cd /backstores/fileio
create disk0 /data/iscsi/disk0.img sparse=true write_back=false
cd disk0
set attribute is_nonrot=1
cd ..
create disk1 /data/iscsi/disk1.img sparse=true write_back=false
cd disk1
set attribute is_nonrot=1
cd ..
create disk2 /data/iscsi/disk2.img sparse=true write_back=false
cd disk2
set attribute is_nonrot=1

The above code block creates the fileio references to each of our disks, and also set the is_nonrot flag to tell the system that these are non-rotational (ie Flash) devices.

Still in the targetcli shell, we’ll start our iSCSI configuration.

cd /iscsi
create iqn.2025-02.com.example.iscsi:target01 
cd iqn.2025-02.com.example.iscsi:target01/tpg1/luns
create /backstores/fileio/disk0
create /backstores/fileio/disk1
create /backstores/fileio/disk2

This will create LUNs for each of our disks. Finally, still in the targetcli shell we’ll create an ACL to allow a specific host to access the target. We’ll then delete it. This puts the correct syntax in our command history, so we can refer back to it in the future (I plan to use this as a template for future tests).

cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/acls
create iqn.1998-01.com.vmware:host.lab.enterpriseadmins.org:3:65
delete iqn.1998-01.com.vmware:host.lab.enterpriseadmins.org:3:65
exit

The exit will cause targetcli to save our changes so they’ll persist a reboot. For testing, we’ll go back into targetcli and add a specific reference to allow our test host to the iSCSI target.

sudo targetcli
create iqn.1998-01.com.vmware:h316-vesx-64.lab.enterpriseadmins.org:394284478:65
exit

On our test client system, we can add a dynamic target for the IP/name & port 3260 of our storage appliance and then rescan for storage. We should see the three disks that we created, with the sizes specified.

As another confirmation, we may want to make each of these disks a VMFS volume. We can do that using syntax similar to the below code block:

Get-ScsiLun -VmHost h316* |?{$_.Vendor -eq 'LIO-ORG'} | %{
  New-Datastore -Host h316* -Name "ds-vmfs-$($_.model)" -Path $_.CanonicalName -Vmfs -FileSystemVersion 6 
}

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
ds-vmfs-disk1                            8.345           9.750
ds-vmfs-disk2                        2,046.312       2,047.750
ds-vmfs-disk0                            8.345           9.750

Looking in the vCenter web interface, we can see that all our storage has been presented.

Once we’ve placed filesystems on these disks, we can go back to the shell and see how much space is being used on disk.

du -h /data/iscsi/*.img
29M     /data/iscsi/disk0.img
29M     /data/iscsi/disk1.img
62M     /data/iscsi/disk2.img

We can see that the creation of a filesystem on these disks does consume some of the blocks (we are using several MB of disk, instead of the previous 0 bytes).

Adding extra LUNs to the iSCSI target is a straightforward process, requiring just a handful of commands. An example can be found in the code block below:

sudo truncate -s 10T /data/iscsi/disk3.img
sudo targetcli
# this should enter the targetcli shell

cd /backstores/fileio
create disk3 /data/iscsi/disk3.img sparse=true write_back=false

cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns
create /backstores/fileio/disk3

exit

The above code block shows the creation of a 10TB disk image, entering the targetcli shell, adding the newly created disk as a ‘fileio’ option, and mapping that disk to our iSCSI target. Finally we exit, which by default will save the configuration and make it persistent. Refreshing storage on ESXi should cause the new LUN to appear. Since we didn’t set the is_nonrot attribute, this device will appear as an HDD instead of a Flash device.

Growing the /data btrfs filesystem

Our filesystem is currently backed by a 15GB disk. We’ve allocated about 12TB of that, so it is grossly oversubscribed. For a production system this would be a terrible idea, but for our lab/illustration purposes this is probably fine. At some point we may need to extend this filesystem to accommodate some growth. I’ve grown ext3 and ext4 filesystems before but wanted to document how to do the extension for the btrfs filesystem used in this example. I choose btrfs because it supports larger files, allowing us to create images as large as supported by ESXi (62TB). The following code block will show how to extend this filesystem in the guest OS. This assumes we have already increased the size of the disk in the vCenter web client, for illustration purposes we’ve extended the disk from 15GB to 20GB.

df -h
# shows that the filesystem did not autogrow

echo 1 | sudo tee /sys/class/block/sdb/device/rescan
# rescans for disk changes

sudo lsblk  
# confirms disk is now seen as 20gb

sudo btrfs device usage /data
# shows that device size is 20gb

sudo btrfs filesystem resize max /data
# results in:
# Resize device id 1 (/dev/sdb) from 15.00GiB to max

df -h /data
# confirm filesystem is now 15gb

The above commands rescanned our disk to be aware of the new size, then resized the filesystem to the size we defined in the hypervisor (20GB).

To confirm this works as expected, we can refresh storage information for one of our NFS mounts. The capacity should increase from 15GB to 20GB, as seen in the following screenshot.

Conclusion

Creating this storage server to support NFS 3, NFS 4.1, and iSCSI targets is relatively straightforward. Having this pre-configured storage appliance can greatly streamline the process of testing various storage protocols, especially in virtual environments where quick deployment is key.

Posted in Lab Infrastructure, Virtualization | 1 Comment

Extending Aria Automation with Custom Resources and Actions for IP Address Management

In my lab, I leverage Aria Automation to deploy Linux, Windows, and nested ESXi VMs. This is my primary interface for requesting new systems and covers most of the common resources I need for testing. However, I sometimes deploy one off appliances and such, at a scale where automation hasn’t been built. These appliances typically require an IP Address and DNS record. I had previously created a Jenkins job that accepted parameters, making these easy enough to create, but the cleanup is where I would fall down. I also wasn’t a huge fan of switching between the Aria Automation & Jenkins consoles to submit these requests.

My ideal solution to both of these problems was an Aria Automation request form that would create a deployment tracking these one-off IP requests. To not re-invent the wheel, this Aria Automation request could simply call Jenkins. When testing is complete, I’d have a deployment remaining in Aria Automation to serve as a reminder to properly clean up IPAM and DNS. This article will cover the process of creating this action, resource, and template to front end the Jenkins request with Aria Automation.

Custom Action – Create

In Aria Automation Assembler > Extensibility > Actions, we can create a new action. I named mine IPAM Next Address Create and selected only the project where my test deployments live.

For the action, I’m writing everything in PowerShell, since I already know that language and Aria Automation supports it. This code sample lacks robust error handling, and probably could be cleaned up a fair amount, but it got the job done for what I was hoping for. In a production environment, adding some logic after each step to ensure the task completed would be prudent. In the event that the IPAM service is down or Jenkins isn’t responding, we’d want the request to behave in a predictable way.

The create section has more code as it connects to phpIPAM to get the next address then requests a DNS record be created by Jenkins. I directly obtain the IP address, so that I have it to return as part of the deployment, so that the IP obtained is clearly visible in the deployment.

function handler($context, $inputs) {
    $subnet = $inputs.subnet
    $hostname = $inputs.name
    
    write-host "We've received a $($inputs.'__metadata'.operation) request for subnet $subnet"
 
    $ipamServer = 'ipam.apps.example.com'
    $ipamUser   = 'svc-vra'
    $ipamPass   = 'VMware1!'
    $ipamBaseURL = 'https://'+$ipamServer+'/api/'+$ipamUser+'/'

    # Login to the API with username/password provided.  Create header to be used in next requests.
    write-host "IPAM Login"
    $ipamLogin = (Invoke-RestMethod -Uri "$($ipamBaseURL)user" -Method Post -SkipCertificateCheck -Headers @{'Authorization'='Basic '+[Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes($ipamUser+':'+$ipamPass))}).data.token
    $nextHeader = @{'phpipam-token'=$ipamLogin}

    # Get the subnet ID of the specified CIDR
    write-host "IPAM Get Subnet ID"
    $subnetID = (Invoke-RestMethod -URI "$($ipamBaseURL)subnets/cidr/$subnet" -SkipCertificateCheck -Headers $nextHeader).data.id

    # Make a reservation and provide name/description
    write-host "IPAM Reserve Next"
    $postBody = @{hostname="$($hostname).lab.enterpriseadmins.org"; description='Requested via Automation Extensibility'}
    $myIPrequest = (Invoke-RestMethod -URI "$($ipamBaseURL)addresses/first_free/$subnetID" -SkipCertificateCheck -Method Post -Headers $nextHeader -Body $postBody).data
    
    # Send a DNS Request to Jenkins
    write-host "Jenkins DNS Request"
    $dnsBody = @{reqtype='add'; reqhostname=$hostname; reqipaddress = $myIPrequest; reqzonename='lab.enterpriseadmins.org'} | ConvertTo-Json
    Invoke-RestMethod -URI 'http://jenkins.example.com:8080/generic-webhook-trigger/invoke?token=VRA-dnsRecord' -Method Post -Body $dnsBody -ContentType 'application/json'

    # Return detail to vRA
    $outputs = @{
        address = $myIPrequest
        resourceName = $hostname
    }
    return $outputs
}

The IP address obtained from IPAM as well as the hostname are returned when this task completes.

Custom Action – Read

For our custom resource, we will also need to specify an action to read / check status of our resource. For my purposes, I really don’t need anything specific to be checked, so I simply return all the input parameters. This is the default function / template loaded when creating the action.

function handler($context, $inputs) {
    return $inputs
}

Custom Action – Delete

When we are finished with our deployment and ready to delete, the custom resource needs a ‘delete’ action to call. Again this is written in PowerShell and calls Jenkins to request the actual delete. Jenkins will then connect to DNS and IPAM to process the cleanup.

function handler($context, $inputs) {
    $ipAddress = $inputs.address
    $hostname = $inputs.name
    
    write-host "We've received a $($inputs.'__metadata'.operation) request for IP address $ipAddress and hostname $hostname"
     
    $removeBody = @{reqzonename='lab.enterpriseadmins.org'; operationType='remove'; reqhostname=$hostname; subnetOrIp = $ipAddress} | ConvertTo-Json
    Invoke-RestMethod -URI 'http://jenkins.example.com:8080/generic-webhook-trigger/invoke?token=RequestIpAndDnsRecord' -Method Post -Body $removeBody -ContentType 'application/json'
}

This code could easily have contacted IPAM and DNS as separate requests, but since the Jenkins job already existed with webhook support, I choose to follow that path for simplicity.

Create Custom Resource

In Aria Automation Assembler > Design > Custom Resources we can create a new resource which will run our above actions. I named my resource IPAM Next Address, set the resource type to Custom.IPAM.Request, and based the resource on an ABX user-defined schema. For lifecycle actions, I selected the above IPAM Next Address action for all three required types: create, read, and destroy. For starters I set the scope to only be available for my test project, and finally togged the ‘activate’ switch to make the resource available in blueprints.

Create Template

In Aria Automation Assembler > Design > Custom Template, the design for this request is super simple. There are three inputs: issueNumber, Name, and Subnet. The issue number is used for tracking and becomes part of the host name. The name is the unique part of the hostname, and the subnet is which network to use when finding the next address. My hostname ends up being h<issue-number-padded-3-digits>-<name-entered> (h is a prefix I use for test systems in my homelab). The subnet is a drop-down list with the networks I typically use for testing, defaulting to the selection I use most often.

formatVersion: 1
inputs:
  issueNumber:
    type: integer
    title: Issue Number
  Name:
    type: string
    minLength: 1
    maxLength: 25
    default: ip-01
  Subnet:
    type: string
    title: Subnet
    default: 192.168.10.0/24
    enum:
      - 192.168.10.0/24
      - 192.168.40.0/24
resources:
  IPAddress:
    type: Custom.IPAM.Request
    properties:
      name: h${format("%03d",input.issueNumber)}-${input.Name}
      subnet: ${input.Subnet}
      address: ''
      git-issue-number: ${input.issueNumber}

Deploy

Once I published a version of this design, I can now make a request from the service broker catalog. My request form only has a few required fields:

I added some functionality into the ‘create’ action to post a comment to my issue tracker letting me know that a new resource has been created. It is created with a task list check box, so that I can see there is an open item to review with this issue, as well as a link to the deployment.

When I look at the deployment, I can see when it was created, if it expires, and can use the actions drop down to delete the deployment. This delete action calls the Jenkins job mentioned above to remove the DNS record and release the IP address from IPAM.

Conclusion

Aria Automation can provide an interface to leverage existing workflows. This example shows how to create a deployment to track the lifecycle of a created resource, while leveraging an existing system to handle the actual task. This solves my cleanup / tracking issue for one off IP requests as well as getting all the requests submitted from a single console. Hopefully you can use pieces of this workflow in your own environment.

Posted in Lab Infrastructure, Virtualization | Leave a comment