Automating SSL Certificate Replacement with the Aria Suite Lifecycle API

Someone recently asked me if there was an API to replace the Aria Operations for Logs SSL certificate programmatically. In this case, Aria Suite Lifecycle was already deployed and used to manage multiple Aria Operations for Logs clusters, primarily used in regional data centers to forward events to a centralized instance. This meant that our ideal solution would leverage Aria Suite Lifecycle as well, adding the certificate to the locker prior to replacing the certificate in product. A colleague of mine recently published a blog post showing how to rotate Aria Suite Local Account Passwords using APIs and PowerShell: https://stephanmctighe.com/2024/12/20/rotating-aria-suite-local-account-passwords-using-apis-powershell/, so I used the style/splatting method he used for consistency in this post.

Due to the varied nature of requesting/approving certificates, I did not cover the process of creating a certificate signing request using APIs for this example. However, it is possible to do this via API as well. The ‘Create CSR and Key Using POST’ can be called with a POST operation to /lcm/locker/api/v2/certificates/csr as described here: https://developer.broadcom.com/xapis/vmware-aria-suite-lifecycle-rest-api/8.14//lcm-15-186.eng.vmware.com/lcm/locker/api/v2/certificates/csr/post/.

Workflow

I first worked through each of these steps by creating a new collection in Bruno and stepping through each API to understand the inputs/outputs and how everything worked together. Once complete, I looked through each of the requests from Bruno and converted them to a single PowerShell script, to be able to have the end to end workflow in a single document for reference. In the sections of the post below, I’ll step through each chuck of the script and add some additional context on why each section exists and what they do.

Setting up the script

For readability and usability, I decided to have a block of variables and paths at the very start of the script. In this section, you can see Aria Suite Lifecycle hostname/credentials, and basic auth string being defined. There are then a handful of filename/paths related to the certificate, root certificate, and key needed for the certificate I created from a Windows Certificate Services deployment. We then list the name of the Aria Suite Lifecycle environment containing the product we need to update. For demonstration purposes, I created an environment named h308-logs, which only contained a single product (Aria Operations for Logs).

# LCM connection detail
$lcmHost = 'cm-lifecycle-02.lab.enterpriseadmins.org'
$username = 'admin@local'
$password = 'VMware1!'
$authorization = "Basic $([System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes("$($username):$($password)")))"

# Certificate/environment detail
$newCertificateAlias = 'h308-logs-01.lab.enterpriseadmins.org_2025-01-14'
$newCertificateFolder = 'C:\Users\bwuchner\Downloads'
$newCertificateCSR  = 'CSR_h308-logs-01.lab.enterpriseadmins.org_Test.pem'
$newCertificateFile = 'CERT_h308-logs-01.cer'
$newCertificateRoot = 'CERT_rootca.cer'
$environmentName = 'h308-logs'

Reading the certificate files

Our certificate consists of multiple files.

  1. private key, which is at the end of the certificate signing request (CSR) file that was generated by the Aria Suite Lifecycle GUI.
  2. certificate file, which was obtained from our certificate authority and contains subject alternative names for our Aria Operations for Logs hostname and IP address
  3. The root certificate from our certificate authority. In this lab, there are no intermediate certificates required. If they were, they could be added to the $cert variable below.

When using Get-Content, by default PowerShell will read one line of the file at a time. In the examples below, we join each new line with a new line character (`n) so that the API will understand our request. Failure to do so might result in an error like parsing issue: malformed PEM data encountered, LCM_CERTIFICATE_API_ERROR0000, or Unknown Certificate error.

# When we generated a CSR in the UI, before sending it to our CA, the private key is at the end of the
# CSR file.  We'll read that file, loop through and find the start/end of the private key, then format
# it to send in our JSON body
$key = Get-Content "$newCertificateFolder\$newCertificateCSR"
$keyCounter = 0
$key | %{if($_ -eq '-----BEGIN PRIVATE KEY-----'){$keyStartLine=$keyCounter};  if($_ -eq '-----END PRIVATE KEY-----'){$keyEndLine=$keyCounter}; $keyCounter++ }
$key = ($key[$keyStartLine..$keyEndLine] -join "`n") 

# We'll also read in our cert and concatenate each line with a new line character.
# If we have intermedate certs they can be joined in a similar way
$cert = ((Get-Content "$newCertificateFolder\$newCertificateFile") -join "`n") + "`n"
$cert += ((Get-Content "$newCertificateFolder\$newCertificateRoot") -join "`n") + "`n"

Adding the certificate to the locker

We can POST our new certificate/key combo to the /lcm/locker/api/v2/certificates/import API. It will return details on the certificate, such as the alias provided, the validity, and sha256/sha1 hashes. It does not return the ID of the certificate in the locker, which we’ll need in a future step. Therefore, now seemed like a good time to get the certificate by filtering for the Alias name we used in our original request.

$Splat = @{
    "URI"     = "https://$lcmHost/lcm/locker/api/v2/certificates/import"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Body"    = @{
        'alias'         = $newCertificateAlias
        'certificateChain' = $cert
        'privateKey'    = $key
    } | ConvertTo-JSON
    "Method"  = "POST"
}
$NewCertPost = Invoke-RestMethod @Splat
# the newcertpost variable will have detail on our certificate, its validity, and san fields.
# we will need cert ID, so we'll make a query for it.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/locker/api/v2/certificates"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
$lockerCertId = ((Invoke-RestMethod @Splat).Certificates | ?{$_.alias -eq $newCertificateAlias}).vmid

Depending on what parts of this process we want to automate, it would also be possible to just get the ID of the certificate from the locker in the GUI. When we view the specific certificate, the ID is the GUID we see in the address bar, right after /lcm/locker/certificate:

Finding the environment ID

To replace the product certificate, we’ll need to know which environment ID needs to be updated. We can find this information from the API or the GUI. We’ll start by doing a GET operation for all environments, then filtering by the environment name variable declared at the beginning of the script.

# now that we have our new cert in the locker, we can apply it to the product
# Get Environment ID
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/lcops/api/v2/environments?status=COMPLETED"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
$Environments = Invoke-RestMethod @Splat

# find our specific environment ID
$environmentId = ($Environments |?{$_.environmentName -eq $environmentName}).environmentId

When we are looking at our specific environment in the GUI, the ID can be found in the address bar right after /lcm/lcops/environments:

Finding the product ID

The product ID is also needed for the certificate replacement request. After running the above code block that creates the $Environments variable, we can see a list of product IDs using the code below. It again filters the list and selects all appliable products in our specific environment:

# we also need to know the product ID.  We can get a list of product IDs for the above environment using
# the example below.  In this case we only have Ops for Logs, aka vrli
# ($Environments |?{$_.environmentName -eq $environmentName}).products.id
# vrli

There isn’t a clear way that I found to easily see this product ID from the GUI. However, if you are looking at a specific product, and select … > Export Configuration > Simple, the resulting file name should contain the product ID (example: h308-logs-vrli.json).

To make this more like a multiple-choice question, the values that I currently have across all products in my lab are listed below:

  • vidm
  • vra
  • vrli
  • vrni
  • vrops
  • vssc

Validating the certificate

In the section below, we are using POST to start the pre-validate API to make sure our certificate will work. This API will only return the request ID of the task that is created. We can view progress of the request in the GUI, using something like https://cm-lifecycle-02.lab.enterpriseadmins.org/lcm/lcops/requests/acd529f9-e8af-4c61-9d6d-14ee15730c9d, where the value of $pevalidateRequest is the GUID at the end of our URL. However, in our code block we also wait 30 seconds, then GET the status of our request from the API. We need this to return COMPLETED prior to moving on to the next step. This sample code block does not have error checking/handling as it is primarily an example of calling the APIs.

# Now that we know all the relevant IDs, we can verify our new cert will work.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/lcops/api/v2/environments/$environmentId/products/vrli/certificates/$lockerCertId/pre-validate"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "POST"
}
$prevalidateRequest = (Invoke-RestMethod @Splat).requestId

# lets confirm that our validation completed.
# we may need to wait/recheck here
Start-Sleep -Seconds 30

# Lets ask the requests API if our task is complete.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/request/api/v2/requests/$prevalidateRequest"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
(Invoke-RestMethod @Splat).state  # we want this to return 'COMPLETED'.  If it didn't we should recheck/fail/not continue.

Replacing the certificate

Assuming our pre-validate request above completed, we can move on to the certificate replacement. We do that with a PUT method to our certificate and provide the ID of the certificate in the locker. The PUT only returns the request ID of our task.

# Assuming the above completed, lets keep moving and actually replace the cert.
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/lcops/api/v2/environments/$environmentId/products/vrli/certificates/$lockerCertId"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "PUT"
}
$replacementRequest = (Invoke-RestMethod @Splat).requestId

Checking request status

As mentioned above, in the validating the certificate section, we can query the certificate status from the API as well. This is the same code block as used in the earlier section, only changing the value of the variable at the end of our request URI.

# Once we start the replacement we should wait a bit of time and then see if it is complete
Start-Sleep -Seconds 30
$Splat = @{
    "URI"     = "https://$lcmHost/lcm/request/api/v2/requests/$replacementRequest"
    "Headers" = @{
        'Accept'        = "*/*"
        'Content-Type'  = "application/json"
        "Authorization" = $authorization
    }
    "Method"  = "GET"
}
(Invoke-RestMethod @Splat).state  # we want this to return 'COMPLETED'.  If it returns 'INPROGRESS' we may want to wait/recheck until 'COMPLETED'.

As mentioned before, we can view the status of our request in the GUI as well. The URL would be https://cm-lifecycle-02.lab.enterpriseadmins.org/lcm/lcops/requests/acd529f9-e8af-4c61-9d6d-14ee15730c9d, where the value of $replacementRequest is the GUID at the end of our URL. Alternatively, we could look in the requests tab for the request name of VRLI in Environment h308-logs - Replace Certificate.

Follow up tasks

After replacing a certificate, it is always a good idea to verify that the new certificate is trusted in various other products. For example, if you are using CFAPI to forward logs to this Aria Operations for Logs instance, you should check the source systems to make sure they trust this new certificate. In addition, Aria Operations and Aria Operations for Logs can be integrated. From the Aria Operations integration, check and confirm that Aria Operations for Logs is trusted after completing this change. This is not specific to the API, just a reminder to ensure that new certificates are trusted, whether or not they are replaced in the GUI or using the API.

Conclusion

In this post, we’ve explored how to automate the replacement of an SSL certificate in Aria Operations for Logs using the Aria Suite Lifecycle API. By leveraging PowerShell and the API’s various endpoints, we can streamline the process of managing certificates across Aria Suite environments, ensuring better security and consistency.

Remember, while the steps outlined here focus on certificate replacement, this workflow can also be adapted for other automation tasks within Aria Suite Lifecycle. As with any automation effort, it’s important to test thoroughly in a controlled environment and validate that all systems are properly configured and trust the updated certificates.

Whether you’re managing a single Aria Operations for Logs instance or multiple clusters, automating tasks like certificate replacement can significantly reduce manual effort and minimize downtime. Please continue to explore further API capabilities to enhance your operational efficiency and security posture!

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Unlocking the Power of Metric-Based Search in Aria Operations

When managing a large, virtualized environment, finding objects in Aria Operations can be challenging, especially when you don’t know the object name. Metric-based search, a feature introduced in Aria Operations 8.12, allows you to search for objects based on their metrics or properties—empowering you to quickly identify issues, even without specific names.

I recently posted about replacing some CPUs in my primary homelab system (https://enterpriseadmins.org/blog/virtualization/how-i-doubled-my-homelab-cpu-capacity-for-200-xeon-gold-6230-upgrade/). Prior to making this change, I knew I had a couple of VMs with rather high CPU Ready values. I suspected that the CPU ready would have decreased given the additional cores. I had an idea of a couple of VMs that were likely affected but wanted to leverage metric-based search to make sure I wasn’t missing any.

What Is Metric-Based Search?

Metric-Based search was introduced in Aria Operations 8.12 almost two years ago (https://blogs.vmware.com/management/2023/04/metric-based-search.html). It allows us to use metrics and properties in our search queries. Instead of typing a VM name, we can type a query for all VMs with high CPU Ready or Usage, like this:

Metric: Virtual Machine where CPU|Ready % > 2 or CPU|Usage % > 20

We start out by typing ‘Metric’, telling the search box we want to search using a metric, we then specify the object type of virtual machine, and finally use a where clause to provide additional metrics we wish to look at. The search bar helps auto-complete the entries and will have a green check once we have the syntax correct.

In this case the query only returns one VM… my Aria Automation VM which currently has >20% CPU usage. I’m not able to use the ‘transformation’ selection, because the environment has 225 VMs, which is larger than the maximum scope of 200 as called out in the tool tip below:

Using the ‘ChildOf’ Clause to Narrow Down Results

To refine my search results, I use the ‘childOf’ clause, which allows me to narrow down the query to a specific ESXi host. This is especially useful when I know the VMs I’m looking for are on the same host but don’t know their names.

Metric: Virtual Machine where CPU|Ready % > 2 or CPU|Usage % > 20 childOf core-esxi-34.example.com 

This unlocked the filter ‘transformation’ drop down list, and I can now look at maximum values instead of the current values. I could have used a different object in my childOf query, like a vSphere Folder, distributed port group, Datacenter, or custom datacenter — really any object that is a parent of virtual machine in the inventory hierarchy. We can see that more VMs now match our criteria. Each of these VMs had CPU Ready above 2% prior to installing the new CPUs. After installing new CPUs the values are much lower.

Understanding the Impact of CPU Speed on Performance Metrics

Interestingly, in the above images we can see that while CPU Ready has decreased substantially, CPU Usage has actually increased. I believe this to be due to the clock speed of the CPU cores. Previously the cores were 3.8ghz but they are now 2.1ghz. To do the same amount of work, the slower clock speed CPUs must run at a higher percentage.

Other Use Cases for Metric-Based Search

The side-by-side comparison of metrics in the metric-based search are really helpful. It included the CPU Ready and CPU Usage values as those properties were the first two metrics that are part of my query. If I adjust my query to have three metrics, such as:

Metric: Virtual Machine where CPU|Ready % > 2 or CPU|Usage % > 20 or Memory|Usage % > 5 childOf core-esxi-34.example.com 

I can select which metric is displayed in the left or right column using the column selector in the bottom left of the screen:

In the above examples, we are looking specifically at metrics of VMs. However, we can query properties the same way as well, and also query for different object types. Here are a few examples:

VMs that have more than 5 VMDKs (property): metric: Virtual Machine where Configuration|Number of VMDKs > 5

ESXi hosts that have less than 16 CPU cores (metric): Metric: Host System where Hardware|CPU Information|Number of CPU Cores < 16

Datastores with reclaimable orphaned disks (metric) and type (property): Metric: Datastore where Reclaimable|Orphaned Disks|Disk Space GB > 1 and Summary|Type equals 'NFS'

Conclusion: The Power of Metric-Based Search in Aria Operations

Metric-based search in Aria Operations is a powerful tool that helps you find the right objects even when you don’t know their names. By leveraging metrics like CPU usage or memory usage, you can quickly identify performance bottlenecks and optimize your virtualized infrastructure.

Posted in Lab Infrastructure, Virtualization | Leave a comment

How I Doubled My Homelab CPU Capacity for $200: Xeon Gold 6230 Upgrade

In this post, I’ll walk you through how I solved a growing CPU bottleneck issue in my homelab by upgrading the CPUs in my Dell Precision 7920. I’ll share the process, challenges, and cost-effective solution that allowed me to double my system’s CPU capacity.

The primary system in my homelab is a Dell Precision 7920 tower. I purchased it on eBay with 2x Xeon 5222 CPUs and 512GB of RAM about 2 years ago, replacing a pair of older HP DL360 Gen8 rack mount systems. The older HP systems had a pair of E5-2450L CPUs, 8 cores/ea at 1.8ghz for a total of 28.8ghz per system… but these systems were primarily constrained by RAM and not CPU. Based on some rough math, I made the decision to go from a total of 32 cores at 1.8ghz to just 8 cores at 3.8ghz.

In the first ~6 months, everything was great. Neither CPU or RAM were bottlenecks, everything was running well. However, as I added more and more nested environments (including nested VCF) I started running into CPU contention. Last year (early 2024), I knew that this cluster CPU usage was high. I could see from Aria Operations that CPU demand was well above the usable capacity most of the time.

CPU Demand of 30-Greenfield cluster, taken in early 2024

Around that time I looked into replacement CPUs for this system. I attempted to drop in some very low cost Xeon Gold 6138 CPUs (1st Gen Scalable) as they were very inexpensive (around $50/pair). Unfortunately, these CPUs were not compatible with the RAM in this system. The memory configuration is 8x64GB 2933 MHz DIMMs, which really limited my CPU choices to only those which run at 2933MHz (based on the table on page 91 of the owners manual). The 2nd Gen Scalable CPUs were preferred, as they are not expected to be deprecated in the next major vSphere release (per https://knowledge.broadcom.com/external/article/318697/cpu-support-deprecation-and-discontinuat.html). I had decided the best two options would be the Xeon Gold 6238 (22cores/socket at 2.1ghz) or 6230 (20cores/socket at 2.1ghz). Around the time these CPUs were running about $500/ea (6238) or $350/ea (6230) from various eBay sellers. I decided to hold off on the replacement and instead turn off/on certain environments as needed instead of running them all the time.

A few weeks ago, when running most of my nested environments concurrently again, I was seeing high CPU use. I did a bit more research and confirmed that the 6238 and 6230 CPUs were still solid options for what I needed, but now the price had fallen to 350/ea (6238) or $95/ea (6230). The 6238 CPUs would provide a total of 92ghz of capacity, while the 6230s would deliver 84ghz. Given that the demand for the cluster is only around 45ghz, the lower cost 6230s were about 2x the capacity I needed like a solid option. I decided to pick up a pair of and get them switched out. In the chart below, you can see that a few days prior to the “Now” line, the usable capacity of this cluster more than doubled. Aria Operations now shows that we are >1 year remaining until CPU runs out.

CPU Demand of 30-Greenfield cluster, taken in Jan 2025 after replacing CPUs

Conclusion

I knew that CPU usage was high, and that the most obvious solution was to add additional capacity. Even after narrowing down the options to just two, helped primarily by memory constraints, having specific capacity history helped make the most cost-effective decision. Instead of spending $700 on a pair of 6238 CPUs, I was able to solve the issue with just $200 for a pair of 6230 CPUs. After making the change, reviewing the same chart confirmed that the issue is in fact solved.

Posted in Lab Infrastructure, Virtualization | 1 Comment

Getting Started with Bruno: A Beginner’s Guide to Simplifying API Interactions

I’ve recently been working with APIs more than ever. A colleague recently introduced me to Bruno, an offline, open-source API client. For the most part, I had been interacting with APIs using swagger UIs in product or with PowerShell’s Invoke-RestMethod. This is sometimes challenging, such as remembering complex URLs, managing headers, or handling authentication. Bruno provides a standalone GUI to help streamline some of these tasks. As I was getting up to speed with the interface, I reviewed several prior posts to connect to various APIs I was familiar with. The following notes are what I learned while getting started with Bruno.

Example 1: APC BackUPS

In a previous article we explored creating our own custom ‘API’ using a powershell HTTP listener. This was a very basic example as it required no authentication or special headers. Since this API is so simple, its an easy first example for Bruno.

  • Create new collection
  • Create new request
    • Name the request (getStats)
    • Specify the URL (http://servername.example.com:6545/getStats)
    • Run (the right arrow at the end of the URL).

The response should show our JSON body that we crafted in our script. Now if I need to run this again, I don’t have to remember the hostname/port number for my service, I can just hit go and get the current response.

Example 1b: Using a Bruno ‘environment’

In the top right of a Bruno collection, there is a drop down list that says ‘No Environment’. If we select that drop down and ‘Configure’ we can create a new environment. This environment is a place to store variables, like server names, ports, and credentials. For my example, I’m going to create an environment named ‘server room’. In the ‘server room’ environment, I’ll define a variable named apcBackupsHost with the value servername.example.com). With this environment variable defined, I can edit my URL to use this variable name, enclosed with a pair of squiggly braces as shown below:

If I had multiple hosts running this API, I could create different environments for each. That way I can toggle between them using the environment drop down list and not need to update any of my API calls. Using this environment functionality can help save time when working with different environments (e.g., production vs. staging) and they can help prevent errors when managing credentials or server names.

Example 2: VI/JSON

The next example comes from a prior post as well — Using VI/JSON with Powershell. VI/JSON was introduced in vSphere 8.0U1 as a way of accessing the vSphere Web Services SDK via a REST interface. To get started with this in Bruno, we’ll make a new collection with a POST request to login. We’ll also make an environment for this collection that has four variables:

  • VC = vCenter Server name or IP
  • vcVersion = the version used in our request (8.0.3.0 in my case)
  • username = the username used to connect to vCenter Server
  • password = the password used to connect to vCenter Server.

I’ve named my request ‘Login’ and set a few properties. First the URL is https://{{VC}}/sdk/vim25/{{vcVersion}}/SessionManager/SessionManager/Login, which contains two of the variables from my environment. The body of the login contains the other two variables, as pictured below:

In addition to the Body I’ve made two other tweaks to this request. You can see where tweaks have been made in the above screenshot… any tab with a change has an indicator count. I’ve outlined the specific changes below:

  • Headers:
    • Name = Content-Type
      • Value = application/json
      • Vars > Post Response:
      • Name: vmware-api-session-id
        • Expr: res.headers['vmware-api-session-id']

The post response variable says to take the vmware-api-session-id response header value and save it in a variable for future use, like in our next request.

My second request I named ‘Get VM’ and is a GET of https://{{VC}}/sdk/vim25/{{vcVersion}}/VirtualMachine/vm-31/config, where vm-31 is the managed object reference ID of a specific VM. For this request, I’ve set two headers, the content-type=application/json and vmware-api-session-id={{vmare-api-session-id}}, which uses the variable we retrieved from the login request as shown below:

With these two headers defined, we can send our request, and it will retrieve the configuration details of our specific VM.

If there is another request we need to make in this same collection, we can right click the name of our request (Get VM in this case) and clone the request. This will make a new request with the same customized values already populated. This allows us to simply change the URL and submit a different request. For example, if I want to get details about all my license keys, I can change the URL to https://{{VC}}/sdk/vim25/{{vcVersion}}/LicenseManager/LicenseManager/licenses. The headers are already populated so I can send the request (CTRL + Enter is the default key binding for this task) and we’ll have a JSON body showing all of our license keys.

Example 3: Aria Ops Casa API

Finally, in another previous post we looked at logging into the Aria Operations Casa API using an LDAP account. This is a bit more difficult as we needed to base64 encode a username:password string to pass as a header for authentication. Lets see if we can do the same in Bruno.

  • Create new collection Aria Ops Casa
  • Create new request Casa Login
  • Create new environment lab with three variables: vropsServer, vropsCasaLdapUser, vropsCasaLdapPass and enter appropriate values. For the password I checked the ‘secret’ checkbox.
  • For the request type we’ll select POST and for our URL we will enter https://{{vropsServer}}/casa/authorize
  • On the script tab, we’ll build a ‘pre request’ to do some of the heavy lifting for authentication. Specifically, we’ll use a built-in function to do base64 encoding of our username/password string and then set our request Authorization header using that string. Sample code below:
const btoa = require("btoa");
var b64login = "vrops-ldap " + btoa(bru.getEnvVar("vropsCasaLdapUser")+":"+bru.getEnvVar("vropsCasaLdapPass"));
req.setHeader("Authorization", b64login );
  • On the Vars tab we’ll update the post response section to create a new variable named accessToken and use the expression res.body.accessToken to get the accessToken property from the body of the response.

Running the above request should get our server name, username, and password variables and use them to connect to the API. We’ll then create a new variable with the token we need for future requests.

To check the Aria Operations cluster status, we’ll start a new API request. This request must run after the above request, which populates the accessToken variable.

We now have this collection saved so we can easily access it in the future. If we have additional Aria Operations instances, we can copy the environments (so that all the variable names come over) and then update the variable values accordingly. This gives us a quick drop down to select which Aria Operations environment to query so we don’t need to re-enter username & passwords every time.

Conclusion

Bruno makes quick work of firing off a simple API call. The collections and environments are useful, especially when we have many endpoints we may want to query. I can see including this application as part of my API toolkit and you should consider it too. More information about Bruno can be found in the official docs at https://docs.usebruno.com/introduction/what-is-bruno.

Posted in Lab Infrastructure, Scripting | Leave a comment

Are my ESXi hosts sending syslog to Aria Operations for Logs?

I was recently working on an issue where a query in Aria Operations for Logs was not returning an event that I really expected to be present. After a bit of troubleshooting, I found that the ESXi host was sending the logs to syslog, but a firewall was preventing the logs from being received. Reflecting on this, I realized that there were many possible failure scenarios where a host could be property configured, but something in the path could be causing problems. You can see some of the possible failure points in the image below, anywhere the log message has to traverse a firewall or forwarder are all suspects for problems.

As we can see above, some syslog topologies can be complex, and that introduces the possibility of failure. ESXi host firewalls, physical firewalls, and any log forwarding device can be a place where events are lost. I wanted to create a script to help identify some of these gaps which we’ll outline below.

Part 1 – Sending a Test Message

For this test, I wanted to use the esxcli system syslog mark command to send a message. To make this message easy to find in Aria Operations for Logs, I generated a GUID to send in the message and will be able to look for it later. Any unique string will work, but this is something easy enough to generate with each test. Also, in larger environments where good configuration management is happening, I may not need to test every host. I decided to add a bit of logic in the script to only test a percentage of available hosts.

$newGuid = [guid]::NewGuid().guid
$message = @{'message'="$newGUID - Test Message"}

$percent = Read-Host -Prompt "What percentage of Hosts should we review? " 

# For each random host, send a syslog message with esxcli
$sendResults = @()
$hosts = get-vmhost -State:Connected
$hostCount = [math]::Ceiling(( $hosts | Measure-Object).Count * ($percent / 100))
$hosts | Get-Random -Count $hostCount | Sort-Object Name | %{
  $esxcli2 = $_ | Get-EsxCli -V2
  
  $sendResults += $_ | Select-Object Name, @{N='SyslogServer';E={($_ | Get-AdvancedSetting -Name Syslog.global.logHost).Value}},
           @{N='SyslogMarkSent';E={$esxcli2.system.syslog.mark.Invoke($message)}}
}

The above code will create a custom object $sendResults that will contain all of the hosts where the test syslog message was sent. In the next section we’ll see which of those events made it to our Aria Operations for Logs instance.

Part 2 – Query the Aria Operations for Logs events API

To make sure our syslog ‘mark’ messages made it from ESXi to our centralized Aria Operations for Logs instance, we’ll use the API to query for logs containing the $newGuid value we sent from part 1.

The first couple of lines of this script take care of logging into the API. We then send an event query path, and make a hashtable of the hostname & timestamp string. This will allow us to index into our results to see when Aria Operations for Logs received our event. Finally, we’ll loop through all the hosts that received a test message in part 1 and get the event timestamp from our hashtable.

$loginBody = @{username='admin'; password='VMware1!'; provider='Local'} | Convertto-Json
$loginToken = (Invoke-RestMethod -uri 'https://syslog.example.com:9543/api/v2/sessions' -Method 'POST' -body $loginBody).sessionId
$myEvents = Invoke-RestMethod -uri "https://syslog.example.com:9543/api/v2/events/text/CONTAINS%20$($newGuid)?limit=1000&timeout=30000&view=SIMPLE&order-by-direction=DESC" -Headers @{Authorization="Bearer $loginToken"} 
$queryHt = $myEvents.results | Select-Object hostname, timestampString | Group-Object -Property hostname -AsHashTable

$finalResults = @()
foreach ($check in $sendResults) {
  $finalResults += $check | Select *, @{N='FoundInLogs';E={ $queryHt[$_.name].timestampString }}
}
$finalresults

If all goes as expected we should see all of our test hosts to have text in every column, with a ‘FoundInLogs’ column having a fairly current timestamp. Instead, we found this in our lab:

Name                        SyslogServer                 SyslogMarkSent FoundInLogs
----                        ------------                 -------------- -----------
h259-vesx-43.example.com    udp://192.168.45.73:514      true           2024-11-17 20
h259-vesx-44.example.com    udp://192.168.45.73:514      true
h259-vsanwit-01.example.com                              true
test-vesx-71.example.com    udp://syslog.example.com:514 true           2024-11-17 20

Above we observe two hosts without a value in ‘FoundInLogs’ and one that doesn’t even have a syslog destination configured. The first host does have syslog configured, but our test message was not received. Investigating this host specifically, we find that the host firewall rule allowing outbound syslog was not enabled, as seen in the screenshot below (where we’d expect the check box to be selected):

This was caused by unchecking that box so that the test would fail, just so we could check our script logic. The other host (a vSAN witness host) does not have a syslog destination defined at all. This happened to be a gap in how configurations where applied in this environment. This host exists outside of a cluster and we are managing this setting at a cluster level; its an oversite that is easily corrected. However, without testing we may not have uncovered these issues.

Conclusion

Automation can help ensure that not only are settings consistently configured across an environment but can also help prove that the end-to-end flow is working. Hopefully this can help identify logging problems before those logs are needed.

Posted in Lab Infrastructure, Scripting | Leave a comment