Troubleshooting Aria Automation Orchestrator Deployment

While deploying VMware Aria Automation Orchestrator 8.18.1 in standalone mode through Aria Suite Lifecycle Manager (LCM), I encountered the following error during configuration:

Error Code: LCMVROVACONFIG100034
Failed to set VMware Aria Automation as authentication provider in VMware Aria Automation Orchestrator.

The deployment completed successfully, I was able to login to Orchestrator and it looked healthy, but the request was in a failed state, and the product was not added to the environment in LCM.

Initial Investigation

A search led me to the following Broadcom Knowledge Base article: https://knowledge.broadcom.com/external/article/427647/aria-automation-orchestrator-integration.html

The article references reviewing the following log file on the Lifecycle Manager appliance:

/var/log/vrlcm/vmware_vrlcm.log

While monitoring this log during a retry, I noticed LCM executing the following command on the Orchestrator appliance:

Command: vracli vro authentication

The command itself appeared to complete successfully:

exit-status: 0
Command executed successfully

However, immediately afterward, the following error appeared in the LCM logs:

com.fasterxml.jackson.core.JsonParseException:
Unexpected character ('-' (code 45)):
Expected space separating root-level values

This suggested that LCM was attempting to parse the command output as JSON and failing.

Looking Closer at the Command Output

The log contained the full response returned by the command. At first glance, the output looked like valid JSON. However, there was something interesting before the JSON payload:

2026-06-16T18:56:56.062818368Z main INFO Starting configuration...
2026-06-16T18:56:56.065478499Z main INFO Start watching for changes...
2026-06-16T18:56:56.075515092Z main INFO Configuration started...

Only after these INFO level log messages did the JSON object begin.

To confirm this behavior, I ran the command directly on the Aria Automation Orchestrator appliance:

vracli vro authentication

The output showed several Log4j informational messages before the JSON configuration data:

2026-06-16T19:04:13.657392968Z main INFO Starting configuration...
2026-06-16T19:04:13.659806004Z main INFO Start watching for changes...
2026-06-16T19:04:13.668725416Z main INFO Configuration started...
{
  ...
}

From a human perspective, this output is readable. From LCM’s perspective, however, it is invalid JSON because the response does not begin with a {.

This explained the parsing exception perfectly.

Finding the Source

The log messages referenced the following configuration file:

/usr/lib/thin-cfg-cli/conf/log4j2.xml

That path did not exist directly on my appliance. Using the following command:

find / | grep -i log4j2.xml

I located the file within a Docker overlay filesystem.

Important: Editing files directly inside Docker overlay storage is generally not recommended. Container updates, restarts, or image replacements can overwrite these changes. The following modification was performed only as a troubleshooting test to validate the root cause.

I temporarily changed the log4j status logging level from INFO to WARN:

sed -i 's/status="INFO"/status="WARN"/i' \
/data/docker/overlay2/782e95577448c9191f0d0f2ac4744f55fc0537ff96f2ccdff55a201adb0ac377/diff/usr/lib/thin-cfg-cli/conf/log4j2.xml

Validation

After making the change, I reran the command from the Orchestrator appliance:

vracli vro authentication

This time the output contained only JSON:

{
  "ch.dunes.authentication.provider": "vsphere",
  ...
}

No log4j informational messages appeared before the JSON payload.

With that result, I retried the failed task in Lifecycle Manager.

The retry completed successfully and the Aria Automation Orchestrator deployment finished without error.

Root Cause

Lifecycle Manager executes:

vracli vro authentication

and expects the response to be valid JSON.

On my deployment, the command emitted log4j initialization messages before the JSON payload. Although the command itself completed successfully with an exit code of zero, the additional logging caused JSON parsing to fail, resulting in:

LCMVROVACONFIG100034

Suppressing the log4j status messages allowed the command to return valid JSON and enabled LCM to complete the authentication provider configuration.

Conclusion

This issue serves as a reminder that a successful command execution does not always mean an automated workflow will succeed. In this case, the actual authentication configuration was valid, and the command returned the expected data. The failure occurred because additional logging output contaminated what Lifecycle Manager expected to be a machine-readable JSON response.

If you encounter LCMVROVACONFIG100034 during a standalone Aria Automation Orchestrator deployment, it may be worth checking the output of vracli vro authentication directly on the appliance. If informational log4j messages appear before the JSON payload, Lifecycle Manager may be failing during JSON parsing rather than during the authentication configuration itself.

While directly modifying files inside Docker overlay storage should not be considered a permanent solution, this troubleshooting exercise helped isolate the root cause and provided a path toward a successful deployment. Hopefully this saves someone else a few hours of digging through logs and chasing what initially appears to be an authentication problem.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Building a TinyCore Linux OVA with Custom OVF Properties

I’ve recently been interested in creating a TinyCore Linux virtual appliance. This OVA would allow for some customization, like hostname, IP address, default gateway, and DNS settings.

I’ve posted about TinyCore Linux before, most recently 2 years ago: https://enterpriseadmins.org/blog/scripting/tinycore-15-virtual-machine-very-small-vm-for-testing/. I really enjoy this very lightweight VM as it works perfectly for demos. In a very small (~28MB in this example) package, we can have a running virtual machine with VMware Tools.

I’ve recently started storing virtual machines that I need to deploy, like the Nested ESXi Fling, as OVAs in a content library. These OVAs are typically customized with OVF properties. I’ve never created my own OVA with custom properties, but found a great guide here: https://williamlam.com/2019/02/building-your-own-virtual-appliances-using-ovf-properties-part-1.html.

I wanted to take these steps and apply them to a TinyCore Linux appliance.

At a high level, this appliance works by exposing VMware OVF properties as guestinfo.* values. During boot, a startup script running inside TinyCore Linux reads those values through VMware Tools and applies the requested network and hostname configuration automatically.

Creating Virtual Machine

I started by creating a minimal virtual machine, with hardware compatibility going back to ESXi 7.0u2 and later (vmx-19), Other 5.x or later Linux (32-bit), 1vCPU, 1GB RAM, and configured the VM to boot using BIOS instead of EFI. I typically choose BIOS as I’ve had issues with VMs booting from the CD for the initial install with EFI.

In the VM, I installed TinyCore command line only to the entire disk. I then installed a few dependencies, copied a script from a webserver, and set that script to run when the system boots (by appending to the builtin /opt/bootlocal.sh script). Note: VMware Tools specifically are required for the mechanisms later in this post to function.

tce-load -wi curl pcre open-vm-tools
sudo wget http://www.example.com/build/tc-ova.txt -O /opt/tc-ova.sh
sudo chmod +x /opt/tc-ova.sh
echo "/opt/tc-ova.sh > /tmp/tc-ova-boot.log 2>&1" | sudo tee -a /opt/bootlocal.sh
echo y | backup

The tc-ova.txt file on my webserver (www.example.com) can be found on GitHub here: https://github.com/bwuch/code-snips/blob/master/build/tc-ova.txt. This file has a .txt extension, but that is so I don’t need to create a .sh mime type on my web server. The file is a generic shell script that retrieves OVF properties using VMware Tools and the vmtoolsd --cmd "info-get guestinfo.*" interface. It is renamed to have a .sh extension by the wget command. The script allows the guest operating system to read values provided during deployment without requiring cloud-init or additional provisioning frameworks. After reading those values, the script will apply them to set IP & subnet mask, default gateway, DNS servers, and hostname, if any of those values are found as OVF properties. By keeping all OVF properties optional, the appliance remains flexible. A deployment can use DHCP with minimal input, or fully specify static networking when needed.

Creating OVF Properties

The guide I used for initial setup (https://williamlam.com/2019/02/building-your-own-virtual-appliances-using-ovf-properties-part-1.html) shows how to create these OVF properties in the UI. I needed to create a handful of properties, with specific names, and was interested in setting some default values. While I could have done this in the UI, I decided to automate the creation of OVF properties with PowerCLI. The script below documents my property names in a CSV file embedded into the script, loops through them to apply them to the VM, and finally exports the VM as an OVA.

$applianceVersion = 'TinyCore_17.0_Appliance'
$vmName = 'h461-tinycore-01'

$ovfProperties = @"
Key,Label,Type,Description,DefaultValue
guestinfo.hostname,Hostname,string,Optional: Short hostname,
guestinfo.domain,DNS Domain,string,Optional: Will be appended to Hostname to set FQDN.,lab.enterpriseadmins.org
guestinfo.dns,DNS Server,string,Optional: Space or comma separated list of DNS servers,192.168.127.30 192.168.32.30
guestinfo.ipaddress,IP Address,string,Optional: IPv4 address to assign to VM
guestinfo.netmask,Netmask,string,"Optional: IPv4 Netmask, please specify if IP Address has been set."
guestinfo.gateway,Default Gateway,string,Optional: IPv4 default gateway
"@ | ConvertFrom-Csv


$spec = New-Object VMware.Vim.VirtualMachineConfigSpec         # Main VM config spec
$spec.vAppConfig = New-Object VMware.Vim.VmConfigSpec          # vApp config container

$propertySpecs = @()

$keyId = 0
foreach ($prop in $ovfProperties) {

    # Create property info object
    $propertyInfo = New-Object VMware.Vim.VAppPropertyInfo
    $propertyInfo.Key = $keyId
    $propertyInfo.Id = $prop.Key
    $propertyInfo.Category = "Guestinfo"
    $propertyInfo.Label = $prop.Label
    $propertyInfo.Type = $prop.Type
    $propertyInfo.DefaultValue = $prop.DefaultValue
    $propertyInfo.UserConfigurable = $true
    $propertyInfo.Description = $prop.Description

    # Create property spec wrapper
    $propertySpec = New-Object VMware.Vim.VAppPropertySpec
    $propertySpec.Operation = "add"
    $propertySpec.Info = $propertyInfo

    # Add to array
    $propertySpecs += $propertySpec
    $keyId++
}

# Attach property specs to vApp config
$spec.vAppConfig.Property = $propertySpecs

$spec.VAppConfig.Product = New-Object VMware.Vim.VAppProductSpec[] (1)
$spec.VAppConfig.Product[0] = New-Object VMware.Vim.VAppProductSpec
$spec.VAppConfig.Product[0].Operation = 'add'
$spec.VAppConfig.Product[0].Info = New-Object VMware.Vim.VAppProductInfo
$spec.VAppConfig.Product[0].Info.VendorUrl = 'http://tinycorelinux.net'
$spec.VAppConfig.Product[0].Info.Vendor = 'TinyCoreLinux'
$spec.VAppConfig.Product[0].Info.Name = $applianceVersion
$spec.VAppConfig.Product[0].Info.ProductUrl = 'http://tinycorelinux.net'
$spec.VAppConfig.Product[0].Info.Key = -1
$spec.VAppConfig.OvfEnvironmentTransport = New-Object String[] (1)
$spec.VAppConfig.OvfEnvironmentTransport[0] = 'com.vmware.guestInfo'

(Get-VM $vmName).ExtensionData.ReconfigVM_Task($spec)
Start-Sleep -Seconds 5  # allow the previous task to complete, we could make this more robust by checking for actual completion of previous task.
Get-VM $vmName | Export-VApp -Destination D:\tmp -Name $applianceVersion -Format:Ova -Description $applianceVersion

The resulting OVA file was very small, approximately 28MB on disk.

Testing the Deployment

When deploying this appliance through the UI, all properties have valid values by default, since all fields are optional. I’ve confirmed that this works as expected and the VM gets its IP from DHCP and the host name is set to the default value (box).

For another test, I deployed the OVA using PowerCLI. I’ll include that script below as well.

$file="D:\tmp\TinyCore_17.0_Appliance.ova"
$vmName=   'h461-tinycore-03'

# Get OVF Config
$ovfConfig = Get-OvfConfiguration -Ovf $file

# Set OVF Properties
$ovfConfig.NetworkMapping.dvportgroup_34861.Value = '192.168.10.0'
$ovfConfig.common.guestinfo.hostname.value  = $vmName
$ovfConfig.common.guestinfo.ipaddress.value = '192.168.10.222'
$ovfConfig.common.guestinfo.netmask.value   = '255.255.255.0'
$ovfConfig.common.guestinfo.gateway.value   = '192.168.10.1'

$newVmSettings = @{
  Source            = $file
  OvfConfiguration  = $ovfConfig
  Name              = $vmName
  VMHost            = 'core-esxi-34.lab.enterpriseadmins.org'
  Location          = '30-Greenfield'
  Datastore         = (Get-Datastore core-tier1-nfs1)
  InventoryLocation = 'Testing'
  DiskStorageFormat = 'thin'
  Confirm           = $false
  Force             = $true
}
$newVM = Import-VApp @newVmSettings
$newVM | Start-VM

You may notice that we did not set the dns or domain properties, as those already had default values. After powering on the VM , we can confirm that the settings were updated and networking is functioning as expected.

Changing the deployment

Once our settings have been set, we can browse to the VM > Configure > vApp Options tab (while the VM is powered off) and adjust our values with the SET VALUE button. When the virtual machine is powered on, the script will automatically run at startup, read the updated OVF properties, and set the values as desired.

Conclusion

I originally started this project because I wanted an extremely small appliance that could be stored in a Content Library and deployed quickly whenever I needed a Linux VM for testing or demos. The result is a TinyCore Linux appliance that occupies only about 28MB on disk while still supporting deployment-time customization through standard OVF properties.

This approach has already proven useful in my lab, and I expect it will become my default “utility VM” going forward. The same techniques could easily be expanded to support additional configuration options or application-specific appliances, making TinyCore Linux a surprisingly capable foundation for custom VMware virtual appliances.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Telegraf Open Agent Updates and VCF Operations 9.1

About a year ago, I published a post covering how to monitor a Raspberry Pi using the open source Telegraf agent and VMware Aria Operations: https://enterpriseadmins.org/blog/virtualization/monitoring-a-raspberry-pi-with-telegraf-and-aria-operations

Recently, while rebuilding this configuration in a VCF 9.1 environment, I encountered a couple of changes that required updates to the original process:

  • Updated InfluxData repository signing and package installation requirements
  • Authentication workflow changes when using VCF 9.1 API tokens with the Telegraf Open Agent integration

The good news is that the remainder of the original workflow still functioned as expected after making these updates.

Although the original article focused on Raspberry Pi monitoring, these updates apply more broadly to Linux-based Telegraf Open Agent deployments, including x64 virtual machines and other supported systems.

Updated Telegraf Repository Configuration

When attempting to install or update Telegraf, apt update now produces the following error:

W: GPG error: https://repos.influxdata.com/ubuntu stable InRelease:
The following signatures couldn't be verified because the public key is not available:
NO_PUBKEY DA61C26A0585BD3B

E: The repository 'https://repos.influxdata.com/ubuntu stable InRelease' is not signed.

This occurs because older repository signing methods commonly used in previous installation examples have been deprecated.

The updated installation process now uses a dedicated keyring file under /etc/apt/keyrings.

Cleanup / Removal Steps

Before adding the new repository, we may need to clean up the bad entries we have (assuming we started with the old post). That fix is rather straightforward, we just need to delete two files:

sudo rm /etc/apt/keyrings/influxdata-archive_compat.key
sudo rm /etc/apt/sources.list.d/influxdata.list

With the optional cleanup complete, we can proceed to the updated installation steps.

Updated Installation Steps

The following commands successfully configured the repository and installed Telegraf in my testing:

curl --silent --location -O https://repos.influxdata.com/influxdata-archive.key

gpg --show-keys --with-fingerprint --with-colons ./influxdata-archive.key 2>&1 \
| grep -q '^fpr:\+24C975CBA61A024EE1B631787C3D57159FC2F927:$' \
&& cat influxdata-archive.key \
| gpg --dearmor \
| sudo tee /etc/apt/keyrings/influxdata-archive.gpg > /dev/null

echo 'deb [signed-by=/etc/apt/keyrings/influxdata-archive.gpg] https://repos.influxdata.com/debian stable main' \
| sudo tee /etc/apt/sources.list.d/influxdata.list

After configuring the repository:

sudo apt-get update
sudo apt-get install telegraf

Worked as expected.

Why This Changed

Modern Debian and Ubuntu-based distributions are moving away from the legacy apt-key approach for repository trust management. Instead, repository signing keys are now commonly stored individually under: /etc/apt/keyrings/. This provides better isolation and improved repository handling.

Using VCF 9.1 API Tokens with the Telegraf Open Agent

With VCF 9.1, I also wanted to test using an API key-based authentication workflow instead of relying on a previously obtained Aria Operations token.

This process works as follows

  • Generate an API token from VMware Identity Broker (VIDB)
  • POST the API token to VIDB
  • Receive a Bearer token
  • Present the Bearer token to VCF Operations

We can generate the API token in Manage > Identity & Access > VCF SSO > select identity broker instance > API Access tab, or by created a personal access token under our profile > Generate API token > Generate.

Once we have an API Token we need to present it to VIDB. We can do this with the following POST command:

vidbExtraLongToken=vidb_ZmVlMzM5ZGYtYWZkYS00OTkzLTkxMW<redatcted>

curl --request POST \
  --url https://vcf479-vidb-01.lab.enterpriseadmins.org/acs/t/CUSTOMER/token \
  --header 'content-type: application/x-www-form-urlencoded' \
  --data grant_type=urn:custom:vcf:params:oauth:grant-type:api-token \
  --data "api_token=$vidbExtraLongToken" \
  --insecure

During testing, I discovered that the token parameter handling within telegraf-utils.sh expected a traditional Aria Operations token format directly.

In the script, I could see (around like 378) an entry that showed:

377-    #set Authorization header for on-prem
378-    AUTHORIZATION_HEADER="Authorization: OpsToken $VROPS_TOKEN"

(Line numbers added for reference)

Proof of Concept Modification

As a proof of concept, I modified the authorization header handling to expect and present a standard Bearer token instead.

Example modification:

AUTHORIZATION_HEADER="Authorization: Bearer $VROPS_TOKEN"

Disclaimer: This modification should be considered a proof of concept only. Directly modifying bundled or vendor-provided scripts is generally not recommended. Use at your own risk.

After making this change, the remainder of the workflow from the original article functioned as expected in my VCF 9.1 environment.

Installing Telegraf

This is the command line I used to install telegraf:

sudo ./telegraf-utils.sh opensource -c 192.168.10.21 -t "<crazy_long_bearer_token_from_prior_curl_command>" -v 192.168.10.21 -d /etc/telegraf/telegraf.d -e /usr/bin/telegraf -k 1

Where 192.168.10.21 was the IP address of my Operations Collector / Cloud Proxy appliance. As before, I needed to change permissions of files in the /etc/telegraf/telegraf.d directory to be owned by telegraf:telegraf and restart the service with systemctl restart telegraf.

Final Thoughts

Outside of the repository signing changes and API token handling updates, the remainder of the original integration process still worked well in my testing.

If you previously implemented the Telegraf Open Agent integration and encounter:

  • repository signing errors
  • NO_PUBKEY DA61C26A0585BD3B
  • or authentication issues with VCF 9.1 API tokens

the updates above should help with the deployment for current environments.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Using PowerCLI with Federated VCF 9.1 Authentication

The VCF PowerCLI 9.1 release notes call out an interesting change to the Connect-VIServer cmdlet (https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-1/release-notes/vmware-cloud-foundation-9-1-0-0-release-notes/what-s-new/whats-new-vcf-cli-api-sdk/vcf-powercli-changelog/vmware-vimautomation-core.html)

Connect-VIServer
Added parameter ‘VcfApiToken’
– Added parameter ‘VcfOAuthSecurityContext’

This change introduces native support for API token authentication in federated VCF environments, making non-interactive automation significantly easier than previous SAML-based approaches.

In a prior post (https://enterpriseadmins.org/blog/scripting/how-to-use-powercli-with-federated-vcenter-logins/), I wrote about using a -SamlSecurityContext parameter to login to a vCenter that had been configured with federated identity. That approach required additional setup using a non-federated user in PowerCLI and only supported interactive browser-based authentication.

This post will focus on using the latest Connect-VIServer cmdlet to connect to a VCF 9.1 vSphere instance. In this environment, an Identity Broker has already been configured using generic OIDC and the VCF Instance is configured to use the SSO provider. Here is a screenshot of the overview page confirming this configuration:

Creating an API Client and Token

In the screenshot above, we can see an ‘API Access’ tab. From here we can create API Clients and API Tokens. We’ll start by selecting create on the ‘API CLIENTS’ sub tab.

For Client Name, I’ll enter VCF_PowerCLI_Admin and then select ‘CREATE API CLIENT’. In Roles, I’ll set the scope to be Components with vcf479-vidb-01 and for role will select VCF Administrator. I’ll finally select SAVE on this page.

With the API Client created, I’ll select the vertical ellipsis and then ‘Generate API Token’.

For the ‘API Token Name’ I’ll provide Brian-PowerCLI-Admin and click ‘Generate API Token’.

This will provide a summary of the token generated. I will not be able to continue until I’ve copied the token value.

Connecting with PowerCLI

The release notes called out two options for authentication. Here is where I believe each of these options would be appropriate.

MethodUse Case
-VcfApiTokenSimple direct login to vCenter
-VcfOAuthSecurityContextReusing authentication across multiple VMware products

We will demo both of these options below.

VcfApiToken parameter

This is a very straightforward option. When you pass the token, VCF PowerCLI automatically discovers the associated VCF SSO instance in the background and completes the login process. After connecting to vCenter, I’ll retrieve a list of VMs to confirm that the connection is working.

PS C:\> Connect-VIServer vcf479-vc-01.lab.enterpriseadmins.org -VcfApiToken 'vidb_MjkxYzNlZTctOWNhZS00MGZjLWE4ZDg<redacted>'

Name                           Port  User
----                           ----  ----
vcf479-vc-01.lab.enterprise... 443   CUSTOMER\73c160a0-adcc-4259...


PS C:\> Get-VM

Name                 PowerState Num CPUs MemoryGB
----                 ---------- -------- --------
vcf479-license-01    PoweredOn  2        4.000
vcf479-opscol-01     PoweredOn  4        16.000
vcf479-ops-01        PoweredOn  4        16.000
vcf479-nsx-01        PoweredOn  6        24.000
vcf479-sddcm-01      PoweredOn  4        16.000
vcf479-vsp-01-c8bmk  PoweredOn  12       24.000
vcf479-vsp-01-rnn58  PoweredOn  12       24.000
vcf479-vsp-01-7zdvf  PoweredOn  12       24.000
vcf479-vsp-01-2dcws  PoweredOn  4        10.000
vcf479-vc-01         PoweredOn  4        21.000

VcfOAuthSecurityContext parameter

When using the VcfOAuthSecurityContext parameter, the IdentityBrokerHostname is also required.

PS C:\> $vcfOauthSec = New-VcfOAuthSecurityContext -IdentityBrokerHostname 'vcf479-vidb-01.lab.enterpriseadmins.org' -ApiToken 'vidb_MjkxYzNlZTctOWNhZS00MGZjLWE4ZDg<redacted>'
PS C:\>
PS C:\> Connect-VIServer vcf479-vc-01.lab.enterpriseadmins.org -VcfOAuthSecurityContext $vcfOauthSec

Name                           Port  User
----                           ----  ----
vcf479-vc-01.lab.enterprise... 443   CUSTOMER\73c160a0-adcc-4259...


PS C:\> Get-VM

Name                 PowerState Num CPUs MemoryGB
----                 ---------- -------- --------
vcf479-license-01    PoweredOn  2        4.000
vcf479-opscol-01     PoweredOn  4        16.000
vcf479-ops-01        PoweredOn  4        16.000
vcf479-nsx-01        PoweredOn  6        24.000
vcf479-sddcm-01      PoweredOn  4        16.000
vcf479-vsp-01-c8bmk  PoweredOn  12       24.000
vcf479-vsp-01-rnn58  PoweredOn  12       24.000
vcf479-vsp-01-7zdvf  PoweredOn  12       24.000
vcf479-vsp-01-2dcws  PoweredOn  4        10.000
vcf479-vc-01         PoweredOn  4        21.000

We can use this authenticated security context to connect to other products, such as VCF Operations, which do not provide direct VcfApiToken properties. For example, using the $vcfOauthSec variable created above, I can also connect to the operations instance:

Connect-VcfOpsServer vcf479-ops-01.lab.enterpriseadmins.org -VcfOAuthSecurityContext $vcfOauthSec

Conclusion

PowerCLI 9.1 significantly simplifies authentication to federated VCF 9.1 environments.

Compared to previous SAML security context workflows, the new API token and OAuth security context capabilities reduce setup complexity while enabling fully non-interactive authentication. This makes PowerCLI automation easier to integrate with scheduled tasks, orchestration platforms, and CI/CD pipelines.

For simple vCenter connections, -VcfApiToken provides the most straightforward experience. For broader multi-product workflows, -VcfOAuthSecurityContext enables authentication reuse across the environment.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

An Unexpected Benefit of Application-Aware Backups: Finding and Fixing Database Bloat

While working on my recent post about why crash-consistent VM backups aren’t always enough, I ran into an unexpected but very useful side effect of adding application-aware database backups.

Once I started creating regular database dumps for my phpIPAM instances, I noticed something that had been completely invisible when relying solely on full VM backups: the database backups themselves were wildly different sizes.

That observation kicked off a short investigation that ultimately led to cleaning up unnecessary data, shrinking backups, and better understanding what was actually stored in the application.

The Initial Observation: Backup Size Discrepancies

I run multiple phpIPAM instances in my lab. Functionally, they’re similar and store roughly comparable types of data. When I began dumping their databases as part of a snapshot freeze workflow, I expected the backups to be in the same general size range. They weren’t.

  • One instance produced a database dump of roughly 489 MB uncompressed (about 23 MB compressed)
  • Another instance produced a dump of only 5 MB uncompressed (under 1 MB compressed)

At the VM level, this difference was completely masked. A full-VM backup doesn’t make it obvious whether one application’s data is growing abnormally or not—it all just looks like blocks on disk.

The database-level backups, however, made the discrepancy impossible to ignore.

Why VM-Level Backups Hid the Problem

This is one of those cases where VM backups were doing their job perfectly—and still hiding a problem.

From the perspective of the hypervisor:

  • The VM was healthy
  • Snapshots completed successfully
  • Backups restored without issue

But VM backups don’t provide visibility. They protect everything equally, whether the data is critical, redundant, or no longer useful.

Application-aware backups, by contrast, force you to look directly at what’s being protected. In this case, the size difference alone was enough to raise questions.

Digging into the phpIPAM Database

With the size discrepancy in hand, the next step was to look at the database itself.

By inspecting table sizes and row counts, it quickly became clear that one instance was retaining a significant amount of historical or log-related data that the other was not.

To connect to the database, which was running in a container, I ran:

docker compose exec devipam-mariadb /bin/bash

Once I was inside the container, I connected to the database with

mariadb -u root -p

From here, ChatGPT helped me with some SQL queries. The one to find the largest table was:

SELECT
     table_schema as `Database`, 
     table_name AS `Table`, 
     round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` 
FROM information_schema.TABLES 
ORDER BY (data_length + index_length) DESC
LIMIT 5;

This was pointing me at the phpipam.logs table, and to get a feel for some of the events it contained I ran:

SELECT *
FROM phpipam.logs
LIMIT 5;

A few more investigative queries, grouping my username and command, led me to an existing phpIPAM issue:

phpIPAM GitHub Issue #3545 – Excessive database growth due to retained data

The issue documents how certain tables can grow unbounded over time, particularly with historical scan and discovery data enabled. This issue (https://github.com/phpipam/phpipam/issues/3545) even provided a sample query to aid with cleanup. The issue showed creating this as a recurring job, but based on my data this issue was no longer occurring on a regular basis, it was an issue that happened in the past.

Cleaning Up the Data

Armed with that context, I ran a small number of targeted queries to understand and then remove old, unnecessary entries. The goal wasn’t to blindly delete data, but to:

  • Identify logs events responsible for the majority of the growth
  • Confirm the data was no longer operationally useful
  • Reduce backup size without impacting functionality

The following query tested the logic I was going to use for removals:

SELECT
    COUNT(*) AS rows_to_delete,
    MIN(date) AS oldest,
    MAX(date) AS newest
FROM phpipam.logs
WHERE (command = 'user login' or command like 'users object % edit' or details like '% in ipaddresses edited. hostname: %')
  AND date < NOW() - INTERVAL 60 DAY;

This showed about 2.8m rows, dating back nearly 3 years, that I thought would be safe to delete. Changing the statement (replacing the SELECT with a DELETE) resulted in the final cleanup query:

DELETE FROM phpipam.logs
WHERE (command = 'user login' or command like 'users object % edit' or details like '% in ipaddresses edited. hostname: %')
  AND date < NOW() - INTERVAL 60 DAY;

This query took about 20 seconds to execute and deleted the expected 2.8m rows. The functionality of phpIPAM is unchanged, but the backup related results were immediate.

  • Database sizes across instances were now much closer
  • Compressed backup sizes dropped significantly
  • Backup and restore operations became faster

The Secondary Win: Smaller, Faster Backups

Reducing database size isn’t just about saving disk space. Smaller application backups mean:

  • Faster freeze-script execution
  • Shorter snapshot windows
  • Less data to validate during restores
  • Lower risk during recovery

In other words, improving the quality of the data improved the reliability of the backup process itself.

Lessons Learned

This entire chain of events started with a simple goal: making sure I had a known good copy of application data. What I didn’t expect was that application-aware backups would act as a diagnostic tool:

  • They exposed abnormal data growth
  • They encouraged closer inspection of the database
  • They led to tangible improvements in backup efficiency

It’s a good reminder that backups aren’t just about recovery… they’re also a feedback mechanism. When you actually look at what you’re backing up, problems that were previously hidden at the VM layer become much easier to spot.

Conclusion

Crash-consistent VM backups remain a solid foundation, especially in lab environments. But once you add application-aware backups, you may gain another layer of visibility.

In this case, that visibility surfaced unnecessary data growth in phpIPAM, reduced backup sizes, and improved overall reliability. That’s a win well beyond the original goal of “just” having a safer backup.

If nothing else, this experience reinforced one idea: when you back up data at the application level, you’re forced to understand the application better.

Posted in Lab Infrastructure, Scripting | Leave a comment