Are my ESXi hosts sending syslog to Aria Operations for Logs?

I was recently working on an issue where a query in Aria Operations for Logs was not returning an event that I really expected to be present. After a bit of troubleshooting, I found that the ESXi host was sending the logs to syslog, but a firewall was preventing the logs from being received. Reflecting on this, I realized that there were many possible failure scenarios where a host could be property configured, but something in the path could be causing problems. You can see some of the possible failure points in the image below, anywhere the log message has to traverse a firewall or forwarder are all suspects for problems.

As we can see above, some syslog topologies can be complex, and that introduces the possibility of failure. ESXi host firewalls, physical firewalls, and any log forwarding device can be a place where events are lost. I wanted to create a script to help identify some of these gaps which we’ll outline below.

Part 1 – Sending a Test Message

For this test, I wanted to use the esxcli system syslog mark command to send a message. To make this message easy to find in Aria Operations for Logs, I generated a GUID to send in the message and will be able to look for it later. Any unique string will work, but this is something easy enough to generate with each test. Also, in larger environments where good configuration management is happening, I may not need to test every host. I decided to add a bit of logic in the script to only test a percentage of available hosts.

$newGuid = [guid]::NewGuid().guid
$message = @{'message'="$newGUID - Test Message"}

$percent = Read-Host -Prompt "What percentage of Hosts should we review? " 

# For each random host, send a syslog message with esxcli
$sendResults = @()
$hosts = get-vmhost -State:Connected
$hostCount = [math]::Ceiling(( $hosts | Measure-Object).Count * ($percent / 100))
$hosts | Get-Random -Count $hostCount | Sort-Object Name | %{
  $esxcli2 = $_ | Get-EsxCli -V2
  
  $sendResults += $_ | Select-Object Name, @{N='SyslogServer';E={($_ | Get-AdvancedSetting -Name Syslog.global.logHost).Value}},
           @{N='SyslogMarkSent';E={$esxcli2.system.syslog.mark.Invoke($message)}}
}

The above code will create a custom object $sendResults that will contain all of the hosts where the test syslog message was sent. In the next section we’ll see which of those events made it to our Aria Operations for Logs instance.

Part 2 – Query the Aria Operations for Logs events API

To make sure our syslog ‘mark’ messages made it from ESXi to our centralized Aria Operations for Logs instance, we’ll use the API to query for logs containing the $newGuid value we sent from part 1.

The first couple of lines of this script take care of logging into the API. We then send an event query path, and make a hashtable of the hostname & timestamp string. This will allow us to index into our results to see when Aria Operations for Logs received our event. Finally, we’ll loop through all the hosts that received a test message in part 1 and get the event timestamp from our hashtable.

$loginBody = @{username='admin'; password='VMware1!'; provider='Local'} | Convertto-Json
$loginToken = (Invoke-RestMethod -uri 'https://syslog.example.com:9543/api/v2/sessions' -Method 'POST' -body $loginBody).sessionId
$myEvents = Invoke-RestMethod -uri "https://syslog.example.com:9543/api/v2/events/text/CONTAINS%20$($newGuid)?limit=1000&timeout=30000&view=SIMPLE&order-by-direction=DESC" -Headers @{Authorization="Bearer $loginToken"} 
$queryHt = $myEvents.results | Select-Object hostname, timestampString | Group-Object -Property hostname -AsHashTable

$finalResults = @()
foreach ($check in $sendResults) {
  $finalResults += $check | Select *, @{N='FoundInLogs';E={ $queryHt[$_.name].timestampString }}
}
$finalresults

If all goes as expected we should see all of our test hosts to have text in every column, with a ‘FoundInLogs’ column having a fairly current timestamp. Instead, we found this in our lab:

Name                        SyslogServer                 SyslogMarkSent FoundInLogs
----                        ------------                 -------------- -----------
h259-vesx-43.example.com    udp://192.168.45.73:514      true           2024-11-17 20
h259-vesx-44.example.com    udp://192.168.45.73:514      true
h259-vsanwit-01.example.com                              true
test-vesx-71.example.com    udp://syslog.example.com:514 true           2024-11-17 20

Above we observe two hosts without a value in ‘FoundInLogs’ and one that doesn’t even have a syslog destination configured. The first host does have syslog configured, but our test message was not received. Investigating this host specifically, we find that the host firewall rule allowing outbound syslog was not enabled, as seen in the screenshot below (where we’d expect the check box to be selected):

This was caused by unchecking that box so that the test would fail, just so we could check our script logic. The other host (a vSAN witness host) does not have a syslog destination defined at all. This happened to be a gap in how configurations where applied in this environment. This host exists outside of a cluster and we are managing this setting at a cluster level; its an oversite that is easily corrected. However, without testing we may not have uncovered these issues.

Conclusion

Automation can help ensure that not only are settings consistently configured across an environment but can also help prove that the end-to-end flow is working. Hopefully this can help identify logging problems before those logs are needed.

Posted in Lab Infrastructure, Scripting | Leave a comment

Leveraging VMware Aria Operations for Power Consumption Tracking

I’ve been looking for a good reason to try out the Aria Operations Management Pack Builder, ever since a peer of mine built one for Pi-Hole back in early 2023. One thing that I thought would be of particular interest was tracking the power consumption of my lab. This article will outline how I achieved this goal.

Getting the data

The first task for this project was finding a way to get the data on my power consumption. The majority of my lab gear plugs into a small APC BackUPS UPS, which provides enough power to handle the occasional blip, but not to run the lab for any meaningful amount of time. This UPS came with a serial/USB cable that could be managed with some software available for Windows. I don’t have a physical Windows system running 24×7 in my lab, but I do have a domain controller virtual machine that resides on local disk of a single system. Since the VM isn’t moving around with DRS, I was able to pass a USB device through to the VM. I added a new USB controller, then added a new USB device, and then my VM settings had the following entries:

With this USB device passed through to the VM, I was able to install the PowerChute Serial Shutdown application. In this application, there is a logging > data log configuration, where you can set how frequently the service should record data. I’ve set this to 5 minutes. With this configuration enabled, there is a text file that is updated in this folder: C:\Program Files\APC\PowerChute Serial Shutdown\agent\energylog that includes details on the relative load in percentage of the UPS as well as the calculated load in watts. This is a great start, we now have a source of data on our local filesystem.

Formatting the data as JSON

The Aria Operations Management Pack builder can point at an online JSON or XML API and return fields for use in our custom management pack. I had considered enabling a web server (like IIS) to serve up a dynamic page read the energylog and re-format the file as a JSON. This seemed like a bit of overkill, setting up IIS to server a single page. I then remembered seeing some code once to have powershell listen for http requests. I did a bit of searching and put the below code sample together. It listens for HTTP requests, then when one is received, powershell looks for the latest energylog file, finds the last row with the newest data, then returns that data as a JSON object.

$httpListener = New-Object System.Net.HttpListener
$httpListener.Prefixes.Add('http://*:6545/')
$httpListener.Start()

while($true) {
    $context = $httpListener.GetContext()
    $context.Request.HttpMethod
    $context.Request.Url
    $context.Request.Headers.ToString() # pretty printing with .ToString()

    # use a StreamReader to read the HTTP body as a string
    $requestBodyReader = New-Object System.IO.StreamReader $context.Request.InputStream
    $requestBodyReader.ReadToEnd()

    Get-ChildItem 'C:\Program Files\APC\PowerChute Serial Shutdown\agent\energylog\*.log' | ?{$_.LastWriteTime -gt (Get-Date).AddMinutes(-10)} | Sort-Object LastWriteTime | Select-Object -first 1 | Foreach-Object {

        # once we know the latest file, lets read the last line and split on the delimiter so we can assign the various parts to descriptive variables
        $contentParts = (Get-Content $_.FullName -Tail 1).Split(';')
        $responseJson = [pscustomobject][ordered]@{
            'HostName'               = $env:computername
            'ModelName'              = (Get-Content $_.FullName -TotalCount 10 |?{$_ -match 'modelname'}).Split('=')[1]
            'FormattedDate'          = (Get-Date).ToString('yyyy-MM-dd HH:mm')
            '2010Date'               = [int]$contentParts[0]
            'relativeLoadPercentage' = [int]$contentParts[2]
            'calculatedLoadWatts'    = [float]$contentParts[3]
        } | ConvertTo-Json
    } # end file loop

    $context.Response.StatusCode = 200
    $context.Response.ContentType = 'application/json'

    $responseBytes = [System.Text.Encoding]::UTF8.GetBytes($responseJson)
    $context.Response.OutputStream.Write($responseBytes, 0, $responseBytes.Length)

    $context.Response.Close() # end the response
} # end while loop

There is likely some room for improvement in that code, but as a proof of concept it seems to get the job done. I needed to open a Windows firewall port to allow incoming TCP 6545 requests, but we can test our ‘API’ from a remote machine easy enough, again using PowerShell:

Invoke-RestMethod -Uri 'http://servername.example.com:6545'

This should return the JSON object we created above. I created a scheduled task to start the above script when the system starts so that it is always running.

Management Pack Builder

For this part of the process, we’ll download the latest version of the Management Pack Builder appliance from https://marketplace.cloud.vmware.com/services/details/draft-vmware-aria-operation-management-pack-builder-1-1?slug=true and deploy it into our environment.

Our ‘API’ is super simple. Any request to the IP:Port will result in our JSON being returned. When building the management pack, we really don’t need much with authentication, special headers/request query strings, or object relationships. In fact, there are only three tabs where we need to fill in some details in Management Pack Builder, listed below.

Source

  • Hostname: the name of our test system with the USB connected UPS & script running.
  • Port: 6545
  • SSL Configuration: No SSL
  • Authentication: Custom, no details required since our ‘service’ doesn’t have authentication.
  • Global Request Settings (optional): no configuration required.
  • Test Connection Request: default GET, no configuration required.
  • Test Connection Request Advanced: no configuration required.
  • Test: Submit ‘Request’ button — should return the JSON object from script

Requests

I set the path to getStats just to have something listed. Since our API is very simple we don’t require specific paths or headers. By default the request name will have the same value. Using the test ‘Request’ button should again return our expected JSON payload. We can then save our requests.

Objects

Next we’ll create an object using the ‘Add New Object’ button.

  • Object Type: APC BackUPS
  • Change Object Icon: Pick an icon, there is one that sort of looks like a battery, I piecked that one.
  • Attributes from the API Request > expand ‘getStats’ > select all the attributes returned (except 2010date, I didn’t need that one)
  • I left the hostname, model name, formatted date as string properties and the Relative Load Percentage and Calculated Load Watts as decimal metrics.
  • Select object instance name: ‘Model Name’
  • Select object identifiers: ‘Model Name’ + ‘Host Name’

Thats it! We now have enough of the management pack builder fields populated to build our PAK file. From the build tab we select ‘perform collection’ then ‘build’ and use the Pack File download link to get our file which should be about 20MB.

Installing and Configuring our Management Pack in Aria Operations

From there we can install our PAK file in Aria Operations. Instead of setting up the ‘VMware Aria Operations Connections’ feature inside of the Managment Pack Builder, I just switched over to Operations and selected Administration > Integrations > Repository > Add and browsed to my recently downloaded PAK file.

After our integration is installed, we should see an ‘Add Account’ button. Selecting the link will take us to the ‘Add Cloud Account’ page where we can enter the name & hostname of our connection. Here I’ve entered “Server Room UPS” for the name and “dr-control-21.lab.enterpriseadmins.org” as the hostname. Since no username/password are required for our ‘API’, these will be the only required fields.

After a few minutes, we should start seeing data flow into the metrics of our object. I took this screenshot after a few weeks of collection. We can check this out on the Metrics tab of the object and watch our calculated load over time as shown below:

Conclusion

Just because a device doesn’t provide an API doesn’t mean we can’t make our own. Using a bit of custom code + Management Pack Builder allows us to report on almost anything.

Posted in Lab Infrastructure, Scripting | Leave a comment

Exploring VM Security: How to Identify Encrypted Virtual Disks in vSphere

I was recently looking at some virtual machines in a lab and trying to determine which had encrypted virtual disks vs. encrypted configuration folders only. This data is visible in the vSphere UI. From the VM list view we can select the ‘pick columns’ icon in the lower left near the export button (in vCenter Server 8 this is called Manage Columns) and select the checkbox for Encryption.

With this selected, we can see that 4 VMs are all encrypted.

However, if we dig a little deeper, we can see that one VM has the configuration files and the only hard disk encrypted, as shown below:

Another VM only has the first hard disk encrypted (note that Hard disk 2 does not show the word ‘Encrypted’ below the disk size).

And yet another VM only has encrypted configuration files and the hard disk is not encrypted at all.

This makes sense, as the virtual machine view does not list each virtual disk, only the VM configuration. We can encrypt only the configuration, but we can’t encrypt only a hard disk without also encrypting the configuration. This view shows that there is something going on with encryption, but for what I was looking for we’ll need to dig bit deeper.

Since I wanted to check each VMDK of each VM, that’s not something that is easily viewable in the UI without lots of clicking, so I switched over to PowerCLI. I found a blog post from a couple years back (https://blogs.vmware.com/vsphere/2016/12/powercli-for-vm-encryption.html) which mentioned a community powershell module (https://github.com/vmware/PowerCLI-Example-Scripts/tree/master/Modules/VMware.VMEncryption) to report on encryption. Browsing through the code, I saw a ‘KeyID’ property that is present on VMs and Hard Disks where the configuration is encrypted. I created a quick script to loop through all the VMs looking for either of these properties. I could have used the published module, but for this simple exercise it was easy enough to pick/choose the fields I needed.

$myResults = @()
foreach ($thisVM in Get-VM) {
  foreach ($thisVMDK in ($thisVM | Get-HardDisk) ) {
    $myResults += $thisVMDK | Select-Object @{N='VM';E={$thisVM.Name}}, @{N='ConfigEncrypted';E={ if($thisVM.extensionData.config.keyId.KeyId){'True'} }}, 
                @{N='VMDK Encrypted';E={if($_.extensionData.Backing.KeyId.KeyID){'True'} }}, @{N='Hard Disk';E={$_.Name}},
                @{N='vTPM';E={if($thisVM.ExtensionData.config.Hardware.device | ?{$_.key -eq 11000}){'True'} }}
  } # end foreach VMDK
} # end foreach VM

$myResults | Sort-Object VM | Format-Table -AutoSize

Our $myResults variable now contains a row for each virtual hard disk, showing the VM Name, whether or not the ‘Home’ configuration is encrypted, if the VMDK is encrypted, the Hard Disk Name, and if the system has a vTPM or not. By default, the output will sort all the VMs by name, and list all of the properties. However, if I needed a list of all the VMs that might have one or more encrypted VMDKs, I could use the following Where-Object filter.

$myResults | Where-Object {$_.'VMDK Encrypted' -eq 'True'} | Select-Object VM -Unique

This will result in a list of VM names, showing only two interesting VMs. The above screenshot from the UI showed four VMs with encrypted configs.

Hopefully this will be helpful if you are looking for encrypted VMs in an environment.

Posted in Scripting, Virtualization | Leave a comment

Unlocking the Power of VMware.vSphere.SsoAdmin: Automated Reporting and Management

I’ve recently had a couple of questions around automated reporting or changes to the vCenter Server SSO Domain. I’ve seen mention of the VMware.vSphere.SsoAdmin PowerCLI module, but haven’t had a need to dig into it. This post will explore a couple of things that can be achieved with this module.

Installing the Module & Connecting to an SSO Server

The module is available in the PowerShell Gallery as well as in the PowerCLI-Example-Scripts Repo (https://github.com/vmware/PowerCLI-Example-Scripts/tree/master/Modules/VMware.vSphere.SsoAdmin). You can install it with the following syntax:

Install-Module VMware.vSphere.SsoAdmin -Scope:CurrentUser

Once the module is installed we can connect to an SSO server (this is my vCenter Server Appliance).

Connect-SsoAdminServer -Server lab-vcsa-12.example.org -User brian -Password VMware1! -SkipCertificateCheck

A successful connection should return some details about the name/Uri/user that is connected. The following few examples all depend on a successful connection.

Reporting on Group Membership

The first reporting task I was asked about was seeing which users were members of the vsphere.local Administrators group. We can do this by finding the group, then piping that to another cmdlet provided by this module.

Get-SsoGroup -name Administrators -Domain vsphere.local | Get-SsoPersonUser

Here is a sample output:

Name          Domain        Locked Disabled PasswordExpirationRemainingDays
----          ------        ------ -------- -------------------------------
Administrator vsphere.local  False    False                              -1
test1         localos        False    False                              -1
brian         example.org    False    False                              35
lop           localos        False    False                              -1

Changing the administrator@vsphere.local password

One request I received was around the ability to programmatically change the password for the administrator@vsphere.local account. We can do this with a single line of code:

Get-SsoPersonUser -Name administrator -Domain vsphere.local  |Set-SsoPersonUser -NewPassword VMware1!VMware1!

In the above example, we are finding a specific user (with Get-SsoPersonUser) then we pipe that output to Set-SsoPersonUser and specify our NewPassword value.

Once the password is changed, we can login to the UI or with Connect-ViServer to validate that our credentials are successfully updated.

Updating the Active Directory over LDAP Identity Source password

From time to time it may be necessary to update the username/password used to bind to an active directory domain in the vCenter identity sources list. If we have a small number of vCenter Servers, we could probably do this in the GUI as shown in the screenshot below:

However, for a large number of vCenter Servers, or frequent password rotation, automation may be helpful. Fortunately this module can help update this identity source as well.

Get-IdentitySource -External | ?{$_.name -eq 'example.org'} | 
Set-LDAPIdentitySource -Username 'EXAMPLE\svc-ldapbind-a' -Password 'VMware1!'

In the above example we are getting external identity sources only, then using where-object to filter to a specific identity source (this environment has multiple LDAPS directories which require different bind users) then set that identity source updating both the username and password values. This is actually better than the GUI! When we make the same change in the GUI we also need to provide the certificate. With this module we can update only the necessary values and leave the existing certificate. (Note: this module is also capable of updating the certificate if needed.)

Conclusion

The VMware.vSphere.SsoAdmin module is very powerful and worth a closer look.

Posted in Scripting, Virtualization | Leave a comment

Step-by-Step: Installing Ubuntu 24.04 on a Raspberry Pi for DNS and NTP

In my home network, I have a Raspberry Pi4 which provides DNS (pi-hole) and NTP (chrony). Its a device that I don’t touch often and is a ‘production’ type service — in my lab I don’t mind blowing up / breaking things… but this device needs to be stable. If DNS goes offline the family can’t stream shows and it’s a real production down sort of situation. Systems in my lab consume NTP from this device, and regular devices in my home network rely on it providing DNS (for ad blocking as well as conditional forwarding of lab domains to DNS servers in the lab). A few days ago, I noticed that this system was down — it wasn’t answering DNS requests and SSH/VNC wasn’t working. After power cycling the system, I was also no longer able to ping the device. After a bit of troubleshooting, I realized that the SD card used as boot media had failed. The system had been running 24×7 for ~5 years, logging DNS requests and such, probably more write IO than anyone should expect from a consumer SD card.

To resolve the issue I ordered a new SD card… but I realized that this system had about 5 years of various configurations. I’m going to attempt to document the configuration (at least what I remember about it) below.

OS Installation

The previous Raspberry Pi used the Raspbian OS with a GUI. However, I never really used the GUI and primarily access this system remotely. Since most other systems I manage use Ubuntu (specifically 24.04), I decided to install that OS using the server instructions from here: https://ubuntu.com/tutorials/how-to-install-ubuntu-on-your-raspberry-pi#1-overview.

I used the Raspberry Pi Imager for Windows, which allowed me to customize the username/password, hostname, etc of the OS so that it booted up and I could connect via ssh.

Once I was logged into the system, the first thing I did was make sure it was up to date using sudo apt update && sudo apt upgrade. This installed a bunch of updates, so I rebooted for good measure.

Lab Certificate

In rare cases, I’ll access something from my lab from the Raspberry Pi. To make this work without certificate warnings, I installed the lab CA certificate. This is just two commands, one to copy the file and another to update the certs.

sudo wget http://www.example.com/build/rootca-example-com.crt -P /usr/local/share/ca-certificates
sudo update-ca-certificates

Install extra packages

I had a handful of extra packages that I installed. I’ll discuss each of these later, but for now we’ll install them all in one pass.

sudo apt install sssd-ad sssd-tools realmd adcli chrony tinyproxy

Proxy Server

For some occasional testing, I’ll use a proxy server in my lab. This was running in a dedicated VM, but while I’m revisiting things, I decided to co-locate it on this appliance.

# configure proxy
sudo nano /etc/tinyproxy/tinyproxy.conf

# change LogLevel from Info to Warning
# Allow 192.168.0.0/16 by removing comment

sudo systemctl reload tinyproxy

NTP (chrony)

I prefer having NTP servers running on physical devices. Since I don’t have many of those in the lab, I use the Raspberry Pi as a locally accessible NTP server. I’m using the chrony service to do this and allow anything in the lab to query this device for time.

# configure NTP
sudo nano /etc/chrony/chrony.conf
# append the following comment / allow lines to the file
# Define the subnets that can use this host as an NTP server
allow 192.168.0.0/16

sudo systemctl restart chrony.service

Pi-Hole

The reason I first purchased this Raspberry Pi was to block ads on my home network using pi-hole.

curl -sSL https://install.pi-hole.net | bash

# create a custom config file for various forward/reverse domain forwarding:
sudo nano /etc/dnsmasq.d/05-custom.conf

# contents of above new file
server=/lab.enterpriseadmins.org/192.168.127.30
server=/lab.enterpriseadmins.org/192.168.32.30
server=/example.com/192.168.127.30
server=/example.com/192.168.32.30
server=/168.192.in-addr.arpa/192.168.127.30
server=/168.192.in-addr.arpa/192.168.32.30

# from web UI, restart resolver.
# Update pihole settings > DNS, change from recommended allow only local requests to 'permit all origins' so that all lab subnets can resolve names.

# enable php for non-pihole /admin locations
sudo lighttpd-enable-mod fastcgi fastcgi-php
sudo service lighttpd reload

# Create redirect page for / to /admin
echo '<head>  <meta http-equiv="Refresh" content="0; URL=/admin" /> </head>' | sudo tee /var/www/html/index.html

# Create 'get-hostname.php' file in /var/www/html as well, this is for Aria Ops management pack.  The contents of the file should be:
<?php echo '{"hostname":"' . gethostname() . '"}'; ?>

Active Directory Join

Most Ubuntu boxes in my lab are joined to Active Directory for common logins. I configured the same for the Raspberry Pi, although it is not really required.

# configure AD
echo '%lab\ linux\ sudoers ALL=(ALL) NOPASSWD:ALL' | sudo tee -a /etc/sudoers
sudo /usr/sbin/pam-auth-update --enable mkhomedir

sudo /usr/sbin/realm join lab.enterpriseadmins.org -U svc-windowsjoin --computer-ou "ou=services,ou=lab servers,dc=lab,dc=enterpriseadmins,dc=org"
sudo sed -i -e 's/^#\?use_fully_qualified_names.*/use_fully_qualified_names = False/g' /etc/sssd/sssd.conf
sudo systemctl restart sssd.service

Static IP

Once everything was configured/ready, I decided to put the device in service by changing the IP from the DHCP address originally obtained to the static IP address I have configured on most devices.

network:
 version: 2
 ethernets:
   eth0:
     match:
       macaddress: "dc:a6:32:aa:aa:aa"
     dhcp4: no
     addresses: [192.168.127.53/24]
     routes:
      - to: default
        via: 192.168.127.254
     nameservers:
         addresses: [192.168.127.53,192.168.32.53]

To make the new network settings active, we must apply those file changes with sudo netplan apply.

Cleanup

The Raspberry Pi Imager utility used cloud-init to do some customizations. This was running at each startup and left a few messages on the system console. Since we no longer need cloud-init after the system is online, we’ll just remove the package with:

sudo apt purge cloud-init

Conclusion

The Raspberry Pi in my lab has been running for about 5 years with little to no maintenance. Other than this one failed SD card things have been very reliable. The steps here are mostly notes for future reference if I need to rebuild the device again. Hopefully you’ll find the notes helpful.

Posted in Lab Infrastructure | Leave a comment