Monitoring a Raspberry Pi with Telegraf and Aria Operations

I recently set out to configure the open-source Telegraf agent on a physical system in my lab, with the goal of sending telemetry data to Aria Operations. The process for setting this up is documented here: https://techdocs.broadcom.com/us/en/vmware-cis/aria/aria-operations/8-18/vmware-aria-operations-configuration-guide-8-18/connect-to-data-sources/monitoring-applications-and-os-using-open-source-telegraf/monitoring-applications-using-open-source-telegraf/monitoring-applications-using-open-source-telegraf-on-a-linux-platform-saas-onprem.html. Since most of my lab systems are virtualized, the only physical candidate available was a Raspberry Pi running Ubuntu 24.04, and with its ARM-based CPU, I wasn’t sure if it would be supported.

Installing Telegraf on ARM (Ubuntu 24.04)

The first step was to install telegraf from the appropriate repository.

sudo curl -fsSL https://repos.influxdata.com/influxdata-archive_compat.key -o /etc/apt/keyrings/influxdata-archive_compat.key
echo "deb [signed-by=/etc/apt/keyrings/influxdata-archive_compat.key] https://repos.influxdata.com/ubuntu stable main" | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt update
sudo apt -y install telegraf

I then needed to download the utility script to help configure telegraf to send to Aria Operations.

wget --no-check-certificate https://cm-opscp-01.lab.enterpriseadmins.org/downloads/salt/telegraf-utils.sh
chmod +x telegraf-utils.sh

The telegraf-utils.sh script requires an auth token. I accessed the Swager UI at https://ops.example.com/suite-api and used the /auth/token/acquire endpoint to generate the token. Here is the body I submitted.

{
  "username" : "svc-physvr",
  "password" : "VMware1!"
}

In this case, the svc-physvr is a user account created in the Aria Operations UI which maps to a limited access user account. The response body included the necessary token value, which I used when invoking the helper script:

sudo ./telegraf-utils.sh opensource -c 192.168.45.73 -t 24c884f0-2558-40fa-9626-61f577487ea5::7d209766-11f2-456a-a2d9-2a40b4459920 -v 192.168.45.73 -d /etc/telegraf/telegraf.d -e /usr/bin/telegraf

The parameters used in this script are explained in the product documentation.

Finally, I restarted the telegraf service.

sudo systemctl restart telegraf

Unfortunately that was met with an error.

Job for telegraf.service failed because the control process exited with error code.
See "systemctl status telegraf.service" and "journalctl -xeu telegraf.service" for details.

Looking at the logs, we could see that a certificate could not be read

sudo journalctl --no-pager -u telegraf

[...]
Jun  12 19:53:09 rpi-extdns-01 telegraf[1292]: 2025-06-12T18:53:09Z E! loading config file /etc/telegraf/telegraf.d/cloudproxy-http.conf failed: error parsing http array, could not load certificate "/etc/telegraf/telegraf.d/cert.pem": open /etc/telegraf/telegraf.d/cert.pem: permission denied
Jun  12 19:53:09 rpi-extdns-01 systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
[...]

I checked permissions on the cert.pem file and confirmed it was owned by root for user and group. The same was true for the key.pem file. I adjusted permissions for both files and tried again:

sudo chown telegraf:telegraf /etc/telegraf/telegraf.d/cert.pem
sudo chown telegraf:telegraf /etc/telegraf/telegraf.d/key.pem
sudo systemctl restart telegraf

This time no errors occurred. In short, the Telegraf service was unable to read its TLS certificate files because they were owned by root, but the service runs as the telegraf user. Fixing ownership resolved the issue.

Validating Success in Aria Operations

After waiting some time, I could see data in Aria Operations for this physical server. I first searched for the VM name and found an object called “Linux OS on rpi-extdns-01” (the servers hostname).

Clicking on that object allowed me to view metrics/properties which were collected. For example, the below screenshot shows the disk used over time for the root file system.

More details on this system could be found in the dashboard “Linux OS discovered by Telegraf.”

Conclusion

It’s great to have full visibility into this physical server using the same Aria Operations dashboards and alerts I already rely on for virtual systems. The setup was straightforward, and with a few tweaks for file permissions, the integration worked well even on a low-cost Raspberry Pi with an ARM processor.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Cleaning Up Orphaned Tag Associations in vCenter

I was recently made aware of a KB article titled “Tag associations are not removed from vCenter Server database when associated objects are removed or deleted” (https://knowledge.broadcom.com/external/article?articleNumber=344960). The article includes a script that removes orphaned tag assignments left behind in the vCenter Server database after object deletion.

Investigating the Issue

After reviewing this article, I checked the vpxd.log file on one of my lab vCenter Server instances and noticed frequent entries like the following:

2025-06-07T17:33:50.765Z error vpxd[06442] [Originator@6876 sub=Authorize opID=4be68587-f898-41b0-bbd4-2764f0941eaa Authz-7c] MoRef: vim.Datastore:datastore-4936 not found. Error: N5Vmomi5Fault21ManagedObjectNotFound9ExceptionE(Fault cause: vmodl.fault.ManagedObjectNotFound

To quantify this, I ran:

cat /var/log/vmware/vpxd/vpxd.log | grep -i vmodl.fault.ManagedObjectNotFound | wc -l
13738

cat /var/log/vmware/vpxd/vpxd.log | wc -l
210258

This showed that roughly 6.5% of the log entries were related to this specific fault, which strongly suggested lingering tag associations.

Reproducing the Issue

To test further, I moved to a clean vCenter environment with no history of tag usage. I created and tagged 10 virtual machines:

$newCat = New-TagCategory -Name 'h378-category' -Cardinality:Multiple -EntityType:VirtualMachine

0..9 |%{ New-Tag -Name "h378-tag$_" -Category $newCat }

new-vm -VMHost test-vesx-71* -Name "h378-vm0" -Datastore vc3-test03-sdrs -Template template-tinycore-160-cli-cc
1..9 | %{ new-vm -VMHost test-vesx-71* -Name "h378-vm$_" -Datastore vc3-test03-sdrs -Template template-tinycore-160-cli-cc }

New-TagAssignment -Tag (Get-Tag "h378*") -Entity (Get-VM "h378*")

Get-VM "h378*" | Remove-VM -DeletePermanently:$true -Confirm:$false

After deletion, there were no log entries related to orphaned tags. I queried the database using a modified version of the cleanup script in read-only mode and confirmed that no orphaned tag rows existed. This led me to revisit the KB and note that:

In vSphere 7 and 8, tag associations are automatically removed for Virtual Machines and Hosts when the associated object is deleted.

Confirming with Cluster Objects

I then repeated the test using cluster objects, which are not automatically cleaned up:

$newCat = New-TagCategory -Name 'h378-category-Cluster' -Cardinality:Multiple -EntityType:ClusterComputeResource
0..9 |%{ New-Tag -Name "h378-cluster-tag$_" -Category $newCat }

0..9 |%{ New-Cluster -Name "h378-cluster-$_" -Location (Get-Datacenter h378-test) }

New-TagAssignment -Tag (Get-Tag "h378-cluster*") -Entity (Get-Cluster "h378*")

get-cluster "h378*" | remove-cluster -Confirm:$false

Shortly after deletion, the vpxd.log showed ManagedObjectNotFound errors. I verified the orphaned rows using the following SQL query:

${VMWARE_POSTGRES_BIN}/psql -U postgres VCDB -h /var/run/vpostgres <<EOF
select * from cis_kv_keyvalue where kv_provider like 'tagging:%'
and
kv_key like 'tag_association urn:vmomi:ClusterComputeResource:%'
and
regexp_replace(kv_key, 'tag_association urn:vmomi:ClusterComputeResource:domain-c([0-9]+).*', '\1')::bigint
not in (select id from vpx_entity where type_id=3);
EOF

This confirmed 100 orphaned tag associations, which I then cleaned up using the provided tags_delete_job_all.sh script.

Returning to the initial vCenter Server with ~6% of vpxd.log entries coming from this issue, I proceeded to create a snapshot and run the same script. It only removed about 30 orphaned associations. However, now I’m not seeing the new vmodl.fault.ManagedObjectNotFound entries showing up every few seconds.

Cleanup Results

Back on the original vCenter Server where the log showed high volumes of these errors, I took a snapshot and ran the cleanup script. It only removed around 30 entries, but new ManagedObjectNotFound messages have stopped appearing.

This reduction is easy to monitor in Aria Operations for Logs, especially across multiple vCenter environments.

Conclusion

In my environments, VM and Host deletions are the most common, and these objects now clean up their tag associations automatically in recent vSphere versions. However, orphaned associations from cluster or other object types may remain, especially in environments upgraded over time.

By reviewing your vpxd.log and using the methods shown here, you can identify and remediate these issues efficiently.

Posted in Scripting, Virtualization | Leave a comment

Centralized Startup Scripting for Automated VM Load Testing

When building or troubleshooting infrastructure, it is often useful to simulate high CPU or memory usage withou deploying full production workloads. To assist with this, I previously created a few purpose built, like cpubusy.sh and memfill.sh.

Historically, I created multiple Tiny Core Linux templates, each designed for a specific purpose, a generic one for troubleshooting, one that would fill mem and a another to load up the CPU. I’d then place these scripts in /opt/bootlocal.sh, allowing each VM to run its designated script automatically at startup. I’d then control load by simply powering VMs on or off.

The Problem

That setup works fine for simple use cases, but it doesn’t scale well. What if I want:

  • One VM to simulate CPU load,
  • Another to test download speeds,
  • A third to run a custom test script—all using the same base image?

The Common Control Script

This script runs at boot and checks for specific instructions to execute. The idea is simple: deploy a single generic VM template that decides what to do based on either:

  • Metadata (via guestinfo)
  • Network identity (IP, MAC, hostname)
  • Shared config (via GuestStore or a web server)

This common control script can be found here: code-snips/cc.sh.

Where It Looks for Instructions

When a VM boots, the script checks for commands in the following order:

  1. Web server:
    • http://<web>/<macaddress>.txt
    • http://<web>/<ipaddress>.txt
    • http://<web>/<hostname>.txt
    • http://<web>/all.txt
  2. VM Guest Store (using VMware tools):
    • guestinfo.ccScript (specified via advanced VM setting)
    • /custom/cc/cc-all.txt

This layered approach gives flexibility:

  • Set a global script via all.txt
  • Override per host via metadata or identifiers
  • Or push custom scripts directly via GuestInfo

Example: Setting the GuestInfo Property

Using PowerCLI, we can set the script filename per VM like this:

Get-VM h045-tc16-02 | New-AdvancedSetting -Name 'guestinfo.ccScript' -Value 'memfill.sh' -confirm:$false

We can also modify this via the vSphere Web Client.

Demonstration

Here’s a test case from my lab:

  • I set guestinfo.ccScript to memfill.sh
  • The all.txt file includes a simple command to print system time

Upon boot, the VM fills 90% of available RAM using a memory-backed filesystem and prints the time, confirming that both script sources are active.

Later, I removed the guestinfo.ccScript setting and added a <hostname>.txt script to download a file repeatedly from a test web server. After reboot, the VM behaved differently, now acting as a network test client. No changes to the template required.

Sample Scripts

Here are a few lightweight test scripts used in the demo:

  • cpubusy.sh – uses sha1sum to keep all the configured CPU cores busy
  • download.sh – uses wget to get the same webserver file x times and save it to /dev/null
  • memfill.sh – creates a memory backed filesystem using 90% of RAM, then uses dd to fill it

Conclusion

This ‘common config’ approach provides template reuse, easier script management, and dynamic testing control, all without modifying the template.

Whether testing CPU, memory, or network load across dozens of VMs, the common control script simplifies the process and reduces maintenance overhead.

In future iterations, this setup could be extended to include conditional logic (based on boot time, VM tags, or other metadata), or integration with CI pipelines for even more powerful automation.

Posted in Scripting, Virtualization | Leave a comment

Using GuestStore to Deliver Content to Network-Isolated VMs in vSphere

When working with VMs that lack network connectivity, transferring files can be tricky. I recently explored the GuestStore feature a built-in vSphere feature that solves this challenge by allowing file delivery via VMware Tools—even without a network. This post walks through how I used GuestStore to push a script into a TinyCore Linux VM with no network, CD drive, or external storage options.

The vSphere documentation does a good job explaining what GuestStore is and how it can be used: Distributing Content with GuestStore – https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/vsphere/8-0/vsphere-virtual-machine-administration-guide-8-0/managing-virtual-machinesvsphere-vm-admin/distributing-content-with-gueststorevsphere-vm-admin.html

Configuring ESXi Hosts for GuestStore

To begin, I configured all of the hosts in a cluster to use the same NFS datastore as a gueststore repository. The official docs linked above show how to do this per host with esxcli so I used that example to write a PowerCLI equilivant. This example uses one host has a reference to create the arguments, populates the URL value, then sets the value for each host in cluster NestedCluster03.

$setRepo = (Get-VMHost test-vesx-71* | Get-EsxCli -v2).system.settings.gueststore.repository.set.CreateArgs()
$setRepo.url = 'ds:///vmfs/volumes/ebb8ed5e-48fb2f0b/h045-gueststore'

foreach ($thisHost in (Get-Cluster NestedCluster03 | Get-VMHost | Sort-Object Name )) {
  ($thisHost | Get-EsxCli -v2).system.settings.gueststore.repository.set.Invoke($setRepo)
}

Preparing the Content / sample script

This sets a datastore folder named h045-gueststore as the base folder for my custom content. I then created a script file to demonstrate how to get this file to the VM. The full path of the test file would be [nfs-datastore-a] h045-gueststore/custom/myscript.sh. This script will write the current date/time value from the system, dot source the os-release file, and then write the ‘pretty name’ of the OS to the screen:

#!/bin/sh
echo "The current system time is $(date)"
. /etc/os-release
echo "This system is running ${PRETTY_NAME}."

This is a basic shell script. It is just a sample, we could use this same method to distribute any script or binary file that is 512MB or less.

Guest VM Retrieval with VMware Tools

I then deployed a TinyCore Linux VM, which is super small (my OVA is less than 30MB, including open-vm-tools) and perfect for this type of testing as it deploys very quickly. In this example, the TinyCore VM has no network adapter, CD/DVD drive, or floppy drive that could be used to access script files or other binaries. It does run in the NestedCluster03 which has GuestStore configured.

Inside the guest OS we’ll run the following command to retrieve the file:

/usr/local/bin/vmware-toolbox-cmd gueststore getcontent /custom/myscript.sh /tmp/myscript.sh

Since this file is only a few KB in size, we should see a complete progress bar almost immediately, with confirmation that ‘getcontent’ succeeded, as pictured below.

Running the Script

From here we can make our script executable (chmod +x /tmp/myscript.sh) and then run it (/tmp/myscript.sh).

This script creates a very basic output, but its just a starting point. The real key here isn’t the script, its the process of getting the script to the virtual machine which does not have an alternate method of file transfer.

Conclusion

Configuring GuestStore on ESXi hosts wasn’t difficult. Using VMware Tools to get these files into the guest OS was also straightforward. While there are many ways to get files into a virtual machine, this worked well in this specific case where the VM didn’t have a CD drive or functioning network connectivity.

Posted in Scripting, Virtualization | Leave a comment

Comparing Installed Packages on Photon OS Using PowerShell and SSH

When debugging inconsistencies between Photon OS systems, say one is failing and another is stable, it’s useful to compare their installed package versions. In one recent case, I needed a quick way to do just that from my admin workstation. Here’s how I solved it using PowerShell and the Posh-SSH module.

In this test case, both hosts have a user account with the same name/password, so only one credential was created.

# Prompt for SSH credentials
$creds = Get-Credential

# Connect to Host 1 and get package list as JSON
$host1        = '192.168.10.135'
$host1session = New-SSHSession -ComputerName $host1 -Credential $creds -AcceptKey
$host1json    = (Invoke-SSHCommand -Command 'tdnf list installed -json' -SessionId $host1session.SessionId).Output | ConvertFrom-Json

# Connect to Host 2 and get package list as JSON
$host2        = '192.168.127.174'
$host2session = New-SSHSession -ComputerName $host2 -Credential $creds -AcceptKey
$host2json    = (Invoke-SSHCommand -Command 'tdnf list installed -json' -SessionId $host2session.SessionId).Output | ConvertFrom-Json

# Compare the resulting package lists
$compared = Compare-Object -ReferenceObject $host1json -DifferenceObject $host2json -Property Name, Evr

# Group the results by package name and build tabular results for side-by-side compare
foreach ($thisPackage in ($compared | Group-Object -Property Name)) {
  [pscustomobject][ordered]@{
    Name = $thisPackage.Name
    $host1 = ($thisPackage.Group | ?{$_.SideIndicator -eq '<='}).Evr
    $host2 = ($thisPackage.Group | ?{$_.SideIndicator -eq '=>'}).Evr
  }
}

The script gets a list of all installed packages from each host as JSON (using tdnf list installed -json), then converts the JSON output to a powershell object.
The two list of installed packages are then compared using Compare-Object.
Finally, we loop through each unique package and create a new object to compare the versions side by side.

I’ve included the first 10 rows of output below for reference.

Name                           192.168.10.135       192.168.127.174
----                           --------------       ---------------
cloud-init                     24.3.1-1.ph4         25.1-1.ph4
curl                           8.7.1-4.ph4          8.12.0-1.ph4
curl-libs                      8.7.1-4.ph4          8.12.0-1.ph4
elfutils                       0.181-7.ph4          0.181-8.ph4
elfutils-libelf                0.181-7.ph4          0.181-8.ph4
expat                          2.4.9-3.ph4          2.4.9-4.ph4
expat-libs                     2.4.9-3.ph4          2.4.9-4.ph4
gettext                        0.21-4.ph4           0.21-5.ph4
glib                           2.68.4-2.ph4         2.68.4-4.ph4
glibc                          2.32-19.ph4          2.32-20.ph4

Looking at this output, we can see which packages are different between our two hosts.

Conclusion

Comparing installed packages across Photon OS systems can be an invaluable troubleshooting and auditing tool – especially when dealing with configuration drift, unexpected behavior, or undocumented changes. By using PowerShell and the Posh-SSH module, you can quickly automate the comparison process without needing to log in to each system manually. Hopefully, this gives you a solid starting point for your own comparisons and debugging tasks.

Posted in Scripting | Leave a comment