Automated Deployment of Storage Server

In this post, I’ll walk through the process of automating the deployment of a storage server using Aria Automation. Specifically, I’ll show how to add an option to deploy a storage server dynamically in your lab environment and troubleshoot an iSCSI duplicate LUN ID issue.

In a recent post, I documented the creation of a storage server that could provide NFS or iSCSI storage for use in my lab. To make consuming this appliance easier, I wanted to add a check box to my Aria Automation request form to determine if a storage server was needed. If this box was checked, then a storage server would be included with the deployment.

To achieve this, I added an additional vSphere.Machine object to my Assember Template and connected it to the same network as the nested ESXi host. I then added a boolean input to the template. Finally, the ‘count’ property of the machine object is set using the following logic:

      count: ${input.deployStorageServer == true ? 1:0 }

This logic says that if the box is checked (boolean true) then the count is 1, otherwise it is 0. Here is a screenshot of the canvas after adding the machine:

I now have a toggle to deploy (or not) an external storage appliance when deploying nested ESXi hosts.

Duplicate Device ID issue

After creating a couple of test deployments and adding hosts to vCenter Server, I ran into an interesting issue. For deployment #1, I formatted LUN0 as VMFS. For deployment #2, I tried to format LUN0 but it didn’t appear in the GUI. Looking a bit more closely at the devices, I realized that the unique identifiers (naa.* value) of each LUN was the same. This makes sense as each appliance was cloned with devices presented and all. Next, I attempted to unmap and remap the iSCSI device from the appliance. Below are the commands I used.

sudo targetcli

# Remove the iscsi LUN mapping
cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns
delete 0

# Remove the disk mapping & recreate
cd /backstores/fileio/
delete disk0
create disk0 /data/iscsi/disk0.img sparse=true write_back=false

# Recreate the iscsi LUN mapping
cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns
create /backstores/fileio/disk0

After doing this and rescanning the host, I confirmed that the unique identifier (naa ID) changed and I was able to format LUN0 as VMFS.

Longer term, I decided that I should not have the iSCSI mappings pre-created in my storage appliance. I removed the configuration (clearconfig confirm=True) from the appliance, and instead placed the following script as: /data/iscsi/setup.sh:

targetcli backstores/fileio create disk0 /data/iscsi/disk0.img sparse=true write_back=false
targetcli backstores/fileio/disk0 set attribute is_nonrot=1
targetcli backstores/fileio create disk1 /data/iscsi/disk1.img sparse=true write_back=false
targetcli backstores/fileio/disk1 set attribute is_nonrot=1
targetcli backstores/fileio create disk2 /data/iscsi/disk2.img sparse=true write_back=false
targetcli backstores/fileio/disk2 set attribute is_nonrot=1

targetcli /iscsi create iqn.2025-02.com.example.iscsi:target01 
targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns create /backstores/fileio/disk0
targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns create /backstores/fileio/disk1
targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns create /backstores/fileio/disk2

targetcli /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/acls create iqn.1998-01.com.vmware:h316-vesx-64.lab.enterpriseadmins.org:394284478:65

targetcli saveconfig

Now, after deploying the storage appliance, assuming I need an iSCSI target I can run sudo /data/iscsi/setup.sh and have new identifiers generated at runtime. This eliminates the identifier duplication and gets the automation ~80% of the way there.

With the steps outlined above, I was able to automate the deployment of storage servers and resolve the issue with duplicate LUN IDs. This process saves time and ensures each deployment is consistent. Going forward, I’ll continue automating aspects of my lab environment to increase efficiency.

Posted in Lab Infrastructure, Virtualization | Leave a comment

How to Set Up a Minimal NFS and iSCSI Storage Solution Using Ubuntu 24.04

In my lab, I often need different types of storage to test various scenarios. For example, just last week someone asked about using New-Datastore with a specific version of VMFS and I needed to quickly perform a quick syntax check. I’ve found that having a nested storage appliance, like a Open Filer or Free NAS available is helpful. However, these appliances offer way more features than I need and typically have higher resource requirements. Sometimes, setting up specific storage protocols like NFS or iSCSI is crucial for testing and development, but existing solutions can be overly complex or resource-heavy for lab environments. In this post I’ll outline how I solved this problem with a few utilities added to an existing Ubuntu 24.04 template.

Storage Protocols

With this project I wanted to have a single storage target that could provide both NFS and iSCSI storage protocols. For purposes of my lab, nearly all of my storage testing the ‘client’ system will be ESXi. This post will provide some output/examples in that context. For example, ESXi supports block storage (such as iSCSI) and file storage, specifically NFS3 and NFS4.1. Ideally, I want to provide all three of these options with this single appliance, so we’ll show examples of using the appliance in all three of those ways.

Setting Up the Test Appliance

I deployed an Ubuntu 24.04 VM, using the image/customization spec described here: https://enterpriseadmins.org/blog/scripting/ubuntu-24-04-packer-and-vcenter-server-customization-specifications/

The template VM has a single 50GB disk, so I added an additional 50GB disk to use as the backing for the storage server. We’ll format this disk as btrfs and mount it as /data. This will be covered in the following code block:

sudo mkdir /data
sudo mkfs.btrfs /dev/sdb
echo "/dev/sdb /data btrfs defaults 0 0" | sudo tee -a /etc/fstab
sudo systemctl daemon-reload
sudo mount /data

The above code block is creating a folder, formatting our second disk in the system, adding an entry to the fstab file so the filesystem mounts when the system boots, and finally mounts our new disk. After the above is complete, running df -h /data should return the mounted disk and its size, just as a confirmation that everything worked successfully.

Configuring NFS on Ubuntu 24.04

I’ll start with NFS as this is a problem I’ve previously solved for using a Photon OS (https://enterpriseadmins.org/blog/virtualization/vmware-workstation-lab-photon-os-container-host-and-nfs-server/). The only difference this time is that I planned to use Ubuntu 24.04, which has a slightly different package name for the NFS Server components.

sudo apt install nfs-kernel-server -y
sudo mkdir /data/nfs
echo "/data/nfs *(rw,async,no_root_squash,insecure_locks,sec=sys,no_subtree_check)" | sudo tee -a /etc/exports
sudo systemctl daemon-reload
sudo systemctl reload nfs-server

The above code block installs our NFS server package, creates a subfolder to export as NFS, adds an entry to the NFS exports configuration file, then reloads the configuration to take effect. On our client system (ESXi), we can confirm that our work was successful by creating a datastore. I’ll complete that task in PowerCLI below:

New-Datastore -VMHost h316-vesx-64* -Name 'nfs326' -Nfs -NfsHost 192.168.10.26 -Path /data/nfs -Confirm:$false

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
nfs326                                  14.994          15.000

As we can see, the test was a success and returned our mount point size in the command results. The above example resulted in an NFS 3 mount of the NFS folder. I created a subfolder (test41) and executed a similar test to confirm this could work for NFS 4.1 as well.

New-Datastore -VMHost h316-vesx-64* -Name 'nfs326-41' -Nfs -FileSystemVersion 4.1 -NfsHost 192.168.10.26 -Path /data/nfs/test41 -Confirm:$false

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
nfs326-41                               14.994          15.000

As we can see in the vCenter web interface, one of these datastores is NFS 3 and the other is NFS 4.1, both showing the same capacity and free space.

This confirms that we were able to successfully connect to our NFS Server service using ESXi with both NFS 3 and NFS 4.1 connections. Next, we’ll look at setting up iSCSI storage, which requires a slightly different approach.

iSCSI

The iSCSI testing was a bit more interesting. Looking around, I found a couple of ways to create an iSCSI target. I ended up using targetcli to create the iSCSI target. There are plenty of tutorials around for this, including a video: https://www.youtube.com/watch?v=OIpxwX6pTIU and Ubuntu 24.04 specific article: https://www.server-world.info/en/note?os=Ubuntu_24.04&p=iscsi&f=1 that were very helpful. I’ll document the steps below for completeness.

In this first code block we’ll install the service and create the folder where we’ll storage some images.

sudo apt install targetcli-fb -y
sudo mkdir /data/iscsi

The targetcli command can create image files to use as backing for our disks, but in my testing the sparse=true switch does not create sparse files, so we’ll do this as two steps. You’ll note that I’m specifying one image has having a 2T file size… but as you may have noticed in our NFS example, we only have 15GB of disk for our /data mount. This doesn’t result in some ‘magic beans’ sort of free storage… once we write 15GB of stuff to this disk we’ll be out of capacity and run into problems. This is only being done for illustration/simulation purposes…. sometimes you’ll want the UI to show ‘normal’ sizes that you’d see with actual datastores. One reason that the /data mount was configured with btrfs instead of something like ext4 is so we can support image files larger than 16T. This btrfs filesystem will allow files over 62TB in size (62TB being the maximum supported size for VMFS 6). Also in the code block output below, we’ll use du to show that these disks are using 0 bytes on the file system, but have larger apparent sizes.

sudo truncate -s 10G /data/iscsi/disk0.img
sudo truncate -s 10G /data/iscsi/disk1.img
sudo truncate -s 2T /data/iscsi/disk2.img

du -h /data/iscsi/*.img
0       /data/iscsi/disk0.img
0       /data/iscsi/disk1.img
0       /data/iscsi/disk2.img

du -h /data/iscsi/*.img --apparent-size
10G     /data/iscsi/disk0.img
10G     /data/iscsi/disk1.img
2.0T    /data/iscsi/disk2.img

Once we have our files pre-staged, we can start working with targetcli.

sudo targetcli

# this should enter the targetcli shell

cd /backstores/fileio
create disk0 /data/iscsi/disk0.img sparse=true write_back=false
cd disk0
set attribute is_nonrot=1
cd ..
create disk1 /data/iscsi/disk1.img sparse=true write_back=false
cd disk1
set attribute is_nonrot=1
cd ..
create disk2 /data/iscsi/disk2.img sparse=true write_back=false
cd disk2
set attribute is_nonrot=1

The above code block creates the fileio references to each of our disks, and also set the is_nonrot flag to tell the system that these are non-rotational (ie Flash) devices.

Still in the targetcli shell, we’ll start our iSCSI configuration.

cd /iscsi
create iqn.2025-02.com.example.iscsi:target01 
cd iqn.2025-02.com.example.iscsi:target01/tpg1/luns
create /backstores/fileio/disk0
create /backstores/fileio/disk1
create /backstores/fileio/disk2

This will create LUNs for each of our disks. Finally, still in the targetcli shell we’ll create an ACL to allow a specific host to access the target. We’ll then delete it. This puts the correct syntax in our command history, so we can refer back to it in the future (I plan to use this as a template for future tests).

cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/acls
create iqn.1998-01.com.vmware:host.lab.enterpriseadmins.org:3:65
delete iqn.1998-01.com.vmware:host.lab.enterpriseadmins.org:3:65
exit

The exit will cause targetcli to save our changes so they’ll persist a reboot. For testing, we’ll go back into targetcli and add a specific reference to allow our test host to the iSCSI target.

sudo targetcli
create iqn.1998-01.com.vmware:h316-vesx-64.lab.enterpriseadmins.org:394284478:65
exit

On our test client system, we can add a dynamic target for the IP/name & port 3260 of our storage appliance and then rescan for storage. We should see the three disks that we created, with the sizes specified.

As another confirmation, we may want to make each of these disks a VMFS volume. We can do that using syntax similar to the below code block:

Get-ScsiLun -VmHost h316* |?{$_.Vendor -eq 'LIO-ORG'} | %{
  New-Datastore -Host h316* -Name "ds-vmfs-$($_.model)" -Path $_.CanonicalName -Vmfs -FileSystemVersion 6 
}

Name                               FreeSpaceGB      CapacityGB
----                               -----------      ----------
ds-vmfs-disk1                            8.345           9.750
ds-vmfs-disk2                        2,046.312       2,047.750
ds-vmfs-disk0                            8.345           9.750

Looking in the vCenter web interface, we can see that all our storage has been presented.

Once we’ve placed filesystems on these disks, we can go back to the shell and see how much space is being used on disk.

du -h /data/iscsi/*.img
29M     /data/iscsi/disk0.img
29M     /data/iscsi/disk1.img
62M     /data/iscsi/disk2.img

We can see that the creation of a filesystem on these disks does consume some of the blocks (we are using several MB of disk, instead of the previous 0 bytes).

Adding extra LUNs to the iSCSI target is a straightforward process, requiring just a handful of commands. An example can be found in the code block below:

sudo truncate -s 10T /data/iscsi/disk3.img
sudo targetcli
# this should enter the targetcli shell

cd /backstores/fileio
create disk3 /data/iscsi/disk3.img sparse=true write_back=false

cd /iscsi/iqn.2025-02.com.example.iscsi:target01/tpg1/luns
create /backstores/fileio/disk3

exit

The above code block shows the creation of a 10TB disk image, entering the targetcli shell, adding the newly created disk as a ‘fileio’ option, and mapping that disk to our iSCSI target. Finally we exit, which by default will save the configuration and make it persistent. Refreshing storage on ESXi should cause the new LUN to appear. Since we didn’t set the is_nonrot attribute, this device will appear as an HDD instead of a Flash device.

Growing the /data btrfs filesystem

Our filesystem is currently backed by a 15GB disk. We’ve allocated about 12TB of that, so it is grossly oversubscribed. For a production system this would be a terrible idea, but for our lab/illustration purposes this is probably fine. At some point we may need to extend this filesystem to accommodate some growth. I’ve grown ext3 and ext4 filesystems before but wanted to document how to do the extension for the btrfs filesystem used in this example. I choose btrfs because it supports larger files, allowing us to create images as large as supported by ESXi (62TB). The following code block will show how to extend this filesystem in the guest OS. This assumes we have already increased the size of the disk in the vCenter web client, for illustration purposes we’ve extended the disk from 15GB to 20GB.

df -h
# shows that the filesystem did not autogrow

echo 1 | sudo tee /sys/class/block/sdb/device/rescan
# rescans for disk changes

sudo lsblk  
# confirms disk is now seen as 20gb

sudo btrfs device usage /data
# shows that device size is 20gb

sudo btrfs filesystem resize max /data
# results in:
# Resize device id 1 (/dev/sdb) from 15.00GiB to max

df -h /data
# confirm filesystem is now 15gb

The above commands rescanned our disk to be aware of the new size, then resized the filesystem to the size we defined in the hypervisor (20GB).

To confirm this works as expected, we can refresh storage information for one of our NFS mounts. The capacity should increase from 15GB to 20GB, as seen in the following screenshot.

Conclusion

Creating this storage server to support NFS 3, NFS 4.1, and iSCSI targets is relatively straightforward. Having this pre-configured storage appliance can greatly streamline the process of testing various storage protocols, especially in virtual environments where quick deployment is key.

Posted in Lab Infrastructure, Virtualization | 1 Comment

Extending Aria Automation with Custom Resources and Actions for IP Address Management

In my lab, I leverage Aria Automation to deploy Linux, Windows, and nested ESXi VMs. This is my primary interface for requesting new systems and covers most of the common resources I need for testing. However, I sometimes deploy one off appliances and such, at a scale where automation hasn’t been built. These appliances typically require an IP Address and DNS record. I had previously created a Jenkins job that accepted parameters, making these easy enough to create, but the cleanup is where I would fall down. I also wasn’t a huge fan of switching between the Aria Automation & Jenkins consoles to submit these requests.

My ideal solution to both of these problems was an Aria Automation request form that would create a deployment tracking these one-off IP requests. To not re-invent the wheel, this Aria Automation request could simply call Jenkins. When testing is complete, I’d have a deployment remaining in Aria Automation to serve as a reminder to properly clean up IPAM and DNS. This article will cover the process of creating this action, resource, and template to front end the Jenkins request with Aria Automation.

Custom Action – Create

In Aria Automation Assembler > Extensibility > Actions, we can create a new action. I named mine IPAM Next Address Create and selected only the project where my test deployments live.

For the action, I’m writing everything in PowerShell, since I already know that language and Aria Automation supports it. This code sample lacks robust error handling, and probably could be cleaned up a fair amount, but it got the job done for what I was hoping for. In a production environment, adding some logic after each step to ensure the task completed would be prudent. In the event that the IPAM service is down or Jenkins isn’t responding, we’d want the request to behave in a predictable way.

The create section has more code as it connects to phpIPAM to get the next address then requests a DNS record be created by Jenkins. I directly obtain the IP address, so that I have it to return as part of the deployment, so that the IP obtained is clearly visible in the deployment.

function handler($context, $inputs) {
    $subnet = $inputs.subnet
    $hostname = $inputs.name
    
    write-host "We've received a $($inputs.'__metadata'.operation) request for subnet $subnet"
 
    $ipamServer = 'ipam.apps.example.com'
    $ipamUser   = 'svc-vra'
    $ipamPass   = 'VMware1!'
    $ipamBaseURL = 'https://'+$ipamServer+'/api/'+$ipamUser+'/'

    # Login to the API with username/password provided.  Create header to be used in next requests.
    write-host "IPAM Login"
    $ipamLogin = (Invoke-RestMethod -Uri "$($ipamBaseURL)user" -Method Post -SkipCertificateCheck -Headers @{'Authorization'='Basic '+[Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes($ipamUser+':'+$ipamPass))}).data.token
    $nextHeader = @{'phpipam-token'=$ipamLogin}

    # Get the subnet ID of the specified CIDR
    write-host "IPAM Get Subnet ID"
    $subnetID = (Invoke-RestMethod -URI "$($ipamBaseURL)subnets/cidr/$subnet" -SkipCertificateCheck -Headers $nextHeader).data.id

    # Make a reservation and provide name/description
    write-host "IPAM Reserve Next"
    $postBody = @{hostname="$($hostname).lab.enterpriseadmins.org"; description='Requested via Automation Extensibility'}
    $myIPrequest = (Invoke-RestMethod -URI "$($ipamBaseURL)addresses/first_free/$subnetID" -SkipCertificateCheck -Method Post -Headers $nextHeader -Body $postBody).data
    
    # Send a DNS Request to Jenkins
    write-host "Jenkins DNS Request"
    $dnsBody = @{reqtype='add'; reqhostname=$hostname; reqipaddress = $myIPrequest; reqzonename='lab.enterpriseadmins.org'} | ConvertTo-Json
    Invoke-RestMethod -URI 'http://jenkins.example.com:8080/generic-webhook-trigger/invoke?token=VRA-dnsRecord' -Method Post -Body $dnsBody -ContentType 'application/json'

    # Return detail to vRA
    $outputs = @{
        address = $myIPrequest
        resourceName = $hostname
    }
    return $outputs
}

The IP address obtained from IPAM as well as the hostname are returned when this task completes.

Custom Action – Read

For our custom resource, we will also need to specify an action to read / check status of our resource. For my purposes, I really don’t need anything specific to be checked, so I simply return all the input parameters. This is the default function / template loaded when creating the action.

function handler($context, $inputs) {
    return $inputs
}

Custom Action – Delete

When we are finished with our deployment and ready to delete, the custom resource needs a ‘delete’ action to call. Again this is written in PowerShell and calls Jenkins to request the actual delete. Jenkins will then connect to DNS and IPAM to process the cleanup.

function handler($context, $inputs) {
    $ipAddress = $inputs.address
    $hostname = $inputs.name
    
    write-host "We've received a $($inputs.'__metadata'.operation) request for IP address $ipAddress and hostname $hostname"
     
    $removeBody = @{reqzonename='lab.enterpriseadmins.org'; operationType='remove'; reqhostname=$hostname; subnetOrIp = $ipAddress} | ConvertTo-Json
    Invoke-RestMethod -URI 'http://jenkins.example.com:8080/generic-webhook-trigger/invoke?token=RequestIpAndDnsRecord' -Method Post -Body $removeBody -ContentType 'application/json'
}

This code could easily have contacted IPAM and DNS as separate requests, but since the Jenkins job already existed with webhook support, I choose to follow that path for simplicity.

Create Custom Resource

In Aria Automation Assembler > Design > Custom Resources we can create a new resource which will run our above actions. I named my resource IPAM Next Address, set the resource type to Custom.IPAM.Request, and based the resource on an ABX user-defined schema. For lifecycle actions, I selected the above IPAM Next Address action for all three required types: create, read, and destroy. For starters I set the scope to only be available for my test project, and finally togged the ‘activate’ switch to make the resource available in blueprints.

Create Template

In Aria Automation Assembler > Design > Custom Template, the design for this request is super simple. There are three inputs: issueNumber, Name, and Subnet. The issue number is used for tracking and becomes part of the host name. The name is the unique part of the hostname, and the subnet is which network to use when finding the next address. My hostname ends up being h<issue-number-padded-3-digits>-<name-entered> (h is a prefix I use for test systems in my homelab). The subnet is a drop-down list with the networks I typically use for testing, defaulting to the selection I use most often.

formatVersion: 1
inputs:
  issueNumber:
    type: integer
    title: Issue Number
  Name:
    type: string
    minLength: 1
    maxLength: 25
    default: ip-01
  Subnet:
    type: string
    title: Subnet
    default: 192.168.10.0/24
    enum:
      - 192.168.10.0/24
      - 192.168.40.0/24
resources:
  IPAddress:
    type: Custom.IPAM.Request
    properties:
      name: h${format("%03d",input.issueNumber)}-${input.Name}
      subnet: ${input.Subnet}
      address: ''
      git-issue-number: ${input.issueNumber}

Deploy

Once I published a version of this design, I can now make a request from the service broker catalog. My request form only has a few required fields:

I added some functionality into the ‘create’ action to post a comment to my issue tracker letting me know that a new resource has been created. It is created with a task list check box, so that I can see there is an open item to review with this issue, as well as a link to the deployment.

When I look at the deployment, I can see when it was created, if it expires, and can use the actions drop down to delete the deployment. This delete action calls the Jenkins job mentioned above to remove the DNS record and release the IP address from IPAM.

Conclusion

Aria Automation can provide an interface to leverage existing workflows. This example shows how to create a deployment to track the lifecycle of a created resource, while leveraging an existing system to handle the actual task. This solves my cleanup / tracking issue for one off IP requests as well as getting all the requests submitted from a single console. Hopefully you can use pieces of this workflow in your own environment.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Automating SSL Certificate Replacement with the Aria Suite Lifecycle API

Someone recently asked me if there was an API to replace the Aria Operations for Logs SSL certificate programmatically. In this case, Aria Suite Lifecycle was already deployed and used to manage multiple Aria Operations for Logs clusters, primarily used in regional data centers to forward events to a centralized instance. This meant that our ideal solution would leverage Aria Suite Lifecycle as well, adding the certificate to the locker prior to replacing the certificate in product. A colleague of mine recently published a blog post showing how to rotate Aria Suite Local Account Passwords using APIs and PowerShell: https://stephanmctighe.com/2024/12/20/rotating-aria-suite-local-account-passwords-using-apis-powershell/, so I used the style/splatting method he used for consistency in this post.

Due to the varied nature of requesting/approving certificates, I did not cover the process of creating a certificate signing request using APIs for this example. However, it is possible to do this via API as well. The ‘Create CSR and Key Using POST’ can be called with a POST operation to /lcm/locker/api/v2/certificates/csr as described here: https://developer.broadcom.com/xapis/vmware-aria-suite-lifecycle-rest-api/8.14//lcm-15-186.eng.vmware.com/lcm/locker/api/v2/certificates/csr/post/.

Workflow

I first worked through each of these steps by creating a new collection in Bruno and stepping through each API to understand the inputs/outputs and how everything worked together. Once complete, I looked through each of the requests from Bruno and converted them to a single PowerShell script, to be able to have the end to end workflow in a single document for reference. In the sections of the post below, I’ll step through each chuck of the script and add some additional context on why each section exists and what they do.

Setting up the script

For readability and usability, I decided to have a block of variables and paths at the very start of the script. In this section, you can see Aria Suite Lifecycle hostname/credentials, and basic auth string being defined. There are then a handful of filename/paths related to the certificate, root certificate, and key needed for the certificate I created from a Windows Certificate Services deployment. We then list the name of the Aria Suite Lifecycle environment containing the product we need to update. For demonstration purposes, I created an environment named h308-logs, which only contained a single product (Aria Operations for Logs).

# LCM connection detail
$lcmHost = 'cm-lifecycle-02.lab.enterpriseadmins.org'
$username = 'admin@local'
$password = 'VMware1!'
$authorization = "Basic $([System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes("$($username):$($password)")))"

# Certificate/environment detail
$newCertificateAlias = 'h308-logs-01.lab.enterpriseadmins.org_2025-01-14'
$newCertificateFolder = 'C:\Users\bwuchner\Downloads'
$newCertificateCSR  = 'CSR_h308-logs-01.lab.enterpriseadmins.org_Test.pem'
$newCertificateFile = 'CERT_h308-logs-01.cer'
$newCertificateRoot = 'CERT_rootca.cer'
$environmentName = 'h308-logs'

Reading the certificate files

Our certificate consists of multiple files.

  1. private key, which is at the end of the certificate signing request (CSR) file that was generated by the Aria Suite Lifecycle GUI.
  2. certificate file, which was obtained from our certificate authority and contains subject alternative names for our Aria Operations for Logs hostname and IP address
  3. The root certificate from our certificate authority. In this lab, there are no intermediate certificates required. If they were, they could be added to the $cert variable below.

When using Get-Content, by default PowerShell will read one line of the file at a time. In the examples below, we join each new line with a new line character (`n) so that the API will understand our request. Failure to do so might result in an error like parsing issue: malformed PEM data encountered, LCM_CERTIFICATE_API_ERROR0000, or Unknown Certificate error.

# When we generated a CSR in the UI, before sending it to our CA, the private key is at the end of the
# CSR file.  We'll read that file, loop through and find the start/end of the private key, then format
# it to send in our JSON body
$key = Get-Content "$newCertificateFolder\$newCertificateCSR"
$keyCounter = 0
$key | %{if($_ -eq '-----BEGIN PRIVATE KEY-----'){$keyStartLine=$keyCounter};  if($_ -eq '-----END PRIVATE KEY-----'){$keyEndLine=$keyCounter}; $keyCounter++ }
$key = ($key[$keyStartLine..$keyEndLine] -join "`n") 

# We'll also read in our cert and concatenate each line with a new line character.
# If we have intermedate certs they can be joined in a similar way
$cert = ((Get-Content "$newCertificateFolder\$newCertificateFile") -join "`n") + "`n"
$cert += ((Get-Content "$newCertificateFolder\$newCertificateRoot") -join "`n") + "`n"

Adding the certificate to the locker

We can POST our new certificate/key combo to the /lcm/locker/api/v2/certificates/import API. It will return details on the certificate, such as the alias provided, the validity, and sha256/sha1 hashes. It does not return the ID of the certificate in the locker, which we’ll need in a future step. Therefore, now seemed like a good time to get the certificate by filtering for the Alias name we used in our original request.

$Splat = @{
    "URI"     = "https://$lcmHost/lcm/locker/api/v2/certificates/import"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Body"    = @{
        'alias'         = $newCertificateAlias
        'certificateChain' = $cert
        'privateKey'    = $key
    } | ConvertTo-JSON
    "Method"  = "POST"
}
$NewCertPost = Invoke-RestMethod @Splat
# the newcertpost variable will have detail on our certificate, its validity, and san fields.
# we will need cert ID, so we'll make a query for it.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/locker/api/v2/certificates"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
$lockerCertId = ((Invoke-RestMethod @Splat).Certificates | ?{$_.alias -eq $newCertificateAlias}).vmid

Depending on what parts of this process we want to automate, it would also be possible to just get the ID of the certificate from the locker in the GUI. When we view the specific certificate, the ID is the GUID we see in the address bar, right after /lcm/locker/certificate:

Finding the environment ID

To replace the product certificate, we’ll need to know which environment ID needs to be updated. We can find this information from the API or the GUI. We’ll start by doing a GET operation for all environments, then filtering by the environment name variable declared at the beginning of the script.

# now that we have our new cert in the locker, we can apply it to the product
# Get Environment ID
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/lcops/api/v2/environments?status=COMPLETED"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
$Environments = Invoke-RestMethod @Splat

# find our specific environment ID
$environmentId = ($Environments |?{$_.environmentName -eq $environmentName}).environmentId

When we are looking at our specific environment in the GUI, the ID can be found in the address bar right after /lcm/lcops/environments:

Finding the product ID

The product ID is also needed for the certificate replacement request. After running the above code block that creates the $Environments variable, we can see a list of product IDs using the code below. It again filters the list and selects all appliable products in our specific environment:

# we also need to know the product ID.  We can get a list of product IDs for the above environment using
# the example below.  In this case we only have Ops for Logs, aka vrli
# ($Environments |?{$_.environmentName -eq $environmentName}).products.id
# vrli

There isn’t a clear way that I found to easily see this product ID from the GUI. However, if you are looking at a specific product, and select … > Export Configuration > Simple, the resulting file name should contain the product ID (example: h308-logs-vrli.json).

To make this more like a multiple-choice question, the values that I currently have across all products in my lab are listed below:

  • vidm
  • vra
  • vrli
  • vrni
  • vrops
  • vssc

Validating the certificate

In the section below, we are using POST to start the pre-validate API to make sure our certificate will work. This API will only return the request ID of the task that is created. We can view progress of the request in the GUI, using something like https://cm-lifecycle-02.lab.enterpriseadmins.org/lcm/lcops/requests/acd529f9-e8af-4c61-9d6d-14ee15730c9d, where the value of $pevalidateRequest is the GUID at the end of our URL. However, in our code block we also wait 30 seconds, then GET the status of our request from the API. We need this to return COMPLETED prior to moving on to the next step. This sample code block does not have error checking/handling as it is primarily an example of calling the APIs.

# Now that we know all the relevant IDs, we can verify our new cert will work.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/lcops/api/v2/environments/$environmentId/products/vrli/certificates/$lockerCertId/pre-validate"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "POST"
}
$prevalidateRequest = (Invoke-RestMethod @Splat).requestId

# lets confirm that our validation completed.
# we may need to wait/recheck here
Start-Sleep -Seconds 30

# Lets ask the requests API if our task is complete.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/request/api/v2/requests/$prevalidateRequest"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
(Invoke-RestMethod @Splat).state  # we want this to return 'COMPLETED'.  If it didn't we should recheck/fail/not continue.

Replacing the certificate

Assuming our pre-validate request above completed, we can move on to the certificate replacement. We do that with a PUT method to our certificate and provide the ID of the certificate in the locker. The PUT only returns the request ID of our task.

# Assuming the above completed, lets keep moving and actually replace the cert.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/lcops/api/v2/environments/$environmentId/products/vrli/certificates/$lockerCertId"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "PUT"
}
$replacementRequest = (Invoke-RestMethod @Splat).requestId

Checking request status

As mentioned above, in the validating the certificate section, we can query the certificate status from the API as well. This is the same code block as used in the earlier section, only changing the value of the variable at the end of our request URI.

# Once we start the replacement we should wait a bit of time and then see if it is complete
Start-Sleep -Seconds 30
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/request/api/v2/requests/$replacementRequest"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
(Invoke-RestMethod @Splat).state  # we want this to return 'COMPLETED'.  If it returns 'INPROGRESS' we may want to wait/recheck until 'COMPLETED'.

As mentioned before, we can view the status of our request in the GUI as well. The URL would be https://cm-lifecycle-02.lab.enterpriseadmins.org/lcm/lcops/requests/acd529f9-e8af-4c61-9d6d-14ee15730c9d, where the value of $replacementRequest is the GUID at the end of our URL. Alternatively, we could look in the requests tab for the request name of VRLI in Environment h308-logs - Replace Certificate.

Follow up tasks

After replacing a certificate, it is always a good idea to verify that the new certificate is trusted in various other products. For example, if you are using CFAPI to forward logs to this Aria Operations for Logs instance, you should check the source systems to make sure they trust this new certificate. In addition, Aria Operations and Aria Operations for Logs can be integrated. From the Aria Operations integration, check and confirm that Aria Operations for Logs is trusted after completing this change. This is not specific to the API, just a reminder to ensure that new certificates are trusted, whether or not they are replaced in the GUI or using the API.

Conclusion

In this post, we’ve explored how to automate the replacement of an SSL certificate in Aria Operations for Logs using the Aria Suite Lifecycle API. By leveraging PowerShell and the API’s various endpoints, we can streamline the process of managing certificates across Aria Suite environments, ensuring better security and consistency.

Remember, while the steps outlined here focus on certificate replacement, this workflow can also be adapted for other automation tasks within Aria Suite Lifecycle. As with any automation effort, it’s important to test thoroughly in a controlled environment and validate that all systems are properly configured and trust the updated certificates.

Whether you’re managing a single Aria Operations for Logs instance or multiple clusters, automating tasks like certificate replacement can significantly reduce manual effort and minimize downtime. Please continue to explore further API capabilities to enhance your operational efficiency and security posture!

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Unlocking the Power of Metric-Based Search in Aria Operations

When managing a large, virtualized environment, finding objects in Aria Operations can be challenging, especially when you don’t know the object name. Metric-based search, a feature introduced in Aria Operations 8.12, allows you to search for objects based on their metrics or properties—empowering you to quickly identify issues, even without specific names.

I recently posted about replacing some CPUs in my primary homelab system (https://enterpriseadmins.org/blog/virtualization/how-i-doubled-my-homelab-cpu-capacity-for-200-xeon-gold-6230-upgrade/). Prior to making this change, I knew I had a couple of VMs with rather high CPU Ready values. I suspected that the CPU ready would have decreased given the additional cores. I had an idea of a couple of VMs that were likely affected but wanted to leverage metric-based search to make sure I wasn’t missing any.

What Is Metric-Based Search?

Metric-Based search was introduced in Aria Operations 8.12 almost two years ago (https://blogs.vmware.com/management/2023/04/metric-based-search.html). It allows us to use metrics and properties in our search queries. Instead of typing a VM name, we can type a query for all VMs with high CPU Ready or Usage, like this:

Metric: Virtual Machine where CPU|Ready % > 2 or CPU|Usage % > 20

We start out by typing ‘Metric’, telling the search box we want to search using a metric, we then specify the object type of virtual machine, and finally use a where clause to provide additional metrics we wish to look at. The search bar helps auto-complete the entries and will have a green check once we have the syntax correct.

In this case the query only returns one VM… my Aria Automation VM which currently has >20% CPU usage. I’m not able to use the ‘transformation’ selection, because the environment has 225 VMs, which is larger than the maximum scope of 200 as called out in the tool tip below:

Using the ‘ChildOf’ Clause to Narrow Down Results

To refine my search results, I use the ‘childOf’ clause, which allows me to narrow down the query to a specific ESXi host. This is especially useful when I know the VMs I’m looking for are on the same host but don’t know their names.

Metric: Virtual Machine where CPU|Ready % > 2 or CPU|Usage % > 20 childOf core-esxi-34.example.com 

This unlocked the filter ‘transformation’ drop down list, and I can now look at maximum values instead of the current values. I could have used a different object in my childOf query, like a vSphere Folder, distributed port group, Datacenter, or custom datacenter — really any object that is a parent of virtual machine in the inventory hierarchy. We can see that more VMs now match our criteria. Each of these VMs had CPU Ready above 2% prior to installing the new CPUs. After installing new CPUs the values are much lower.

Understanding the Impact of CPU Speed on Performance Metrics

Interestingly, in the above images we can see that while CPU Ready has decreased substantially, CPU Usage has actually increased. I believe this to be due to the clock speed of the CPU cores. Previously the cores were 3.8ghz but they are now 2.1ghz. To do the same amount of work, the slower clock speed CPUs must run at a higher percentage.

Other Use Cases for Metric-Based Search

The side-by-side comparison of metrics in the metric-based search are really helpful. It included the CPU Ready and CPU Usage values as those properties were the first two metrics that are part of my query. If I adjust my query to have three metrics, such as:

Metric: Virtual Machine where CPU|Ready % > 2 or CPU|Usage % > 20 or Memory|Usage % > 5 childOf core-esxi-34.example.com 

I can select which metric is displayed in the left or right column using the column selector in the bottom left of the screen:

In the above examples, we are looking specifically at metrics of VMs. However, we can query properties the same way as well, and also query for different object types. Here are a few examples:

VMs that have more than 5 VMDKs (property): metric: Virtual Machine where Configuration|Number of VMDKs > 5

ESXi hosts that have less than 16 CPU cores (metric): Metric: Host System where Hardware|CPU Information|Number of CPU Cores < 16

Datastores with reclaimable orphaned disks (metric) and type (property): Metric: Datastore where Reclaimable|Orphaned Disks|Disk Space GB > 1 and Summary|Type equals 'NFS'

Conclusion: The Power of Metric-Based Search in Aria Operations

Metric-based search in Aria Operations is a powerful tool that helps you find the right objects even when you don’t know their names. By leveraging metrics like CPU usage or memory usage, you can quickly identify performance bottlenecks and optimize your virtualized infrastructure.

Posted in Lab Infrastructure, Virtualization | Leave a comment