MongoDB on Ubuntu: Replica Sets, LDAP, and Aria Operations

Last year I shared a series of posts walking through how to set up and monitor MongoDB with Aria Operations. For reference, here are those articles:

When I originally created those posts, Mongo DB had not yet added support for Ubuntu 24.04, so I used Ubuntu 20.04, as I had a template for that distribution. Recently I noticed these older Ubuntu 20.04 VMs, as Ubuntu 20.04 reached end of standard support earlier in the year. This post will review the updated setup steps to deploy a Mongo DB Replica Set on Ubuntu 24.04.

Installing Mongo DB 8.0 (latest) on Ubuntu 24.04

The Mongo DB documentation is very well written. I followed their steps from https://www.mongodb.com/docs/manual/tutorial/install-mongodb-enterprise-on-ubuntu/#std-label-install-mdb-enterprise-ubuntu. I’ll include a short code block below with the specific steps:

sudo apt-get install gnupg curl
curl -fsSL https://pgp.mongodb.com/server-8.0.asc | \
   sudo gpg -o /usr/share/keyrings/mongodb-server-8.0.gpg \
   --dearmor

echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-8.0.gpg ] https://repo.mongodb.com/apt/ubuntu noble/mongodb-enterprise/8.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-enterprise-8.0.list

sudo apt-get update
sudo apt-get install mongodb-enterprise

sudo systemctl start mongod
sudo systemctl status mongod
sudo systemctl enable mongod

After running the above commands, my systems were all running a default MongoDB service and that service was set to run automatically at boot.

I confirmed that I could connect to the instance by running mongosh at the console. This allowed me to connect automatically without specifying a password. While in the mongosh console, I created a dbadmin user account with the root role.

var admin = db.getSiblingDB("admin")
admin.createUser(
   {
       user: "dbadmin", 
       pwd: "VMware1!", 
       roles:["root"]
   })

After getting a successful response that my new user account was created, I exited the mongo shell by typing exit.

Configuring MongoDB for Replica Set and LDAP authentication

Back at the command line, I created a directory to store a security.key file to be used for each node in the replica set. I’ve included the details of these commands below:

cd /opt
sudo mkdir mongodb
sudo chown mongodb:mongodb /opt/mongodb

echo '88157a33a9dc499ea6b05c504daa36f8v2' | sudo tee /opt/mongodb/security.key
sudo chmod 400 /opt/mongodb/security.key
sudo chown mongodb:mongodb /opt/mongodb/security.key

With this file created & properly permissioned, we’ll update our mongo configuration file to specify the path to the security.key file. While we are in the file, we’ll add some additional settings for LDAP auth, as well as define the replica set name. We do this with vi /etc/mongod.conf and then make the following edits:

In the security section, we add:

  authorization: enabled
  keyFile: /opt/mongodb/security.key
  ldap:
    servers: "core-control-21.lab.enterpriseadmins.org:389"
    bind:
      queryUser: "CN=svc-ldapbind,OU=LAB Service Accounts,DC=lab,DC=enterpriseadmins,DC=org"
      queryPassword: "VMware1!"
    transportSecurity: "none"
    authz:
      queryTemplate: "{USER}?memberOf?base"
    validateLDAPServerConfig: true
setParameter:
  authenticationMechanisms: "PLAIN,SCRAM-SHA-1,SCRAM-SHA-256"

In the replication section we add:

  replSetName: svcs-rs-11

After updating the /etc/mongod.conf file on each host in my three node cluster, I restarted the service with the command sudo systemctl restart mongod

After the service restarted, I launched mongosh again. Now that authentication has been enabled, I set my database and then login using the following commnds:

use admin
db.auth({ user: 'dbadmin', pwd: 'VMware1!', mechanism: 'SCRAM-SHA-256' })

Next I initiated the replica set using the following syntax:

rs.initiate( {
   _id : "svcs-rs-11",
   members: [
      { _id: 0, host: "svcs-mongo-11.lab.enterpriseadmins.org:27017" },
      { _id: 1, host: "svcs-mongo-12.lab.enterpriseadmins.org:27017" },
      { _id: 2, host: "svcs-mongo-13.lab.enterpriseadmins.org:27017" }
   ]
})

This took a few seconds, but then returned the message { ok: 1 }. I double checked everything was running as expected by checking status rs.status() which returned details of the replica set, showing member nodes and which were primary vs secondary.

Creating custom role for monitoring and administration

I then created a custom role to be used by monitoring tools, like Aria Operations.

var admin = db.getSiblingDB('admin')
admin.createRole(
    {
        role: "CN=LAB MongoDB Ent Monitoring,OU=LAB Service Accounts,DC=lab,DC=enterpriseadmins,DC=org",
        roles: [ { role: "clusterMonitor", db: "admin" } ],
        privileges: []
    }
)

I also created a role to use for management. I could have done this with a single command by providing both roles when creating the role, but wanted to show an example of modifying an existing role as well.

var admin = db.getSiblingDB('admin')
admin.createRole(
    {
        role: "CN=LAB MongoDB Ent Admins,OU=LAB Service Accounts,DC=lab,DC=enterpriseadmins,DC=org",
        roles: [ "dbAdminAnyDatabase", "clusterAdmin"  ],
        privileges: []
    }
)

db.grantRolesToRole("CN=LAB MongoDB Ent Admins,OU=LAB Service Accounts,DC=lab,DC=enterpriseadmins,DC=org", [ { role: "root", db: "admin" } ] )

Loading sample data

Similar to the previous series of posts, I loaded some sample data into this replica set using the following syntax:

curl https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive
mongorestore --archive=sampledata.archive -u dbadmin -p 'VMware1!'

Using mongosh as an LDAP user

Since we have LDAP authentication configured, we can also login to the mongo shell as an LDAP user. The following syntax is an example of how to do so:

mongosh --username "CN=svc-mgdbeadm,OU=LAB Service Accounts,DC=lab,DC=enterpriseadmins,DC=org" --password 'VMware1!' --authenticationDatabase='$external' --authenticationMechanism="PLAIN"

In this case we specify that we want to use an external authentication database (LDAP) and the mechanism as ‘PLAIN’, which we previously enabled as an option when configuring the replica set & LDAP authentication.

Managing databases with a graphical user interface

When demoing the Operations management pack, it is often helpful to interact / show the databases. Workflows such as creating, deleting, renaming a database can be helpful. Doing these demos is often more interesting from a GUI instead of a command line. I recently found mongo-express, a web-based, graphical interface to manage Mongo DB databases. I ran this as a container as a test using the following sytax:

sudo docker run -p 8081:8081 -e ME_CONFIG_MONGODB_URL='mongodb://dbadmin:VMware1!@svcs-mongo-11.lab.enterpriseadmins.org,svcs-mongo-12.lab.enterpriseadmins.org,svcs-mongo-13.lab.enterpriseadmins.org/admin?replicaSet=svcs-rs-11' mongo-express

This connects to the Mongo DB service using our local dbadmin. The console shows us that we can use http://0.0.0.0:8081 to connect to the web interface with the username admin and password of pass. From this web interface we can see / edit / delete our databases during our demos. I’ve since wrapped this up in a docker compose file and exposed it with a reverse proxy to apply an SSL certificate.

Conclusion

With Ubuntu 20.04 out of standard support, refreshing to 24.04 was a necessary step, even in a lab. Getting current by rebuilding this replica set configuration was rather straightforward. Monitoring is continuing to work with the same Aria Operations management pack previously used, with only creating a new data source and reusing the previous credential object.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Simplify Snapshot Management with VCF Operations

Managing snapshots in vSphere environments is a task that folks have dealt with for years. I remember one of my first PowerCLI scripts was one that sent email notifications for snapshots over a week old to review and manually clean them up. In this post we’ll walk through one way of automating this cleanup using VCF Operations Automation Central.

In VCF Operations this is under Infrastructure Operations > Automation Central (or Operations > Automation Central, depending on version) we can create an automated job. There are several tiles available for automated jobs, but for this example we’ll use a ‘reclaim’ job:

For step 1 of our reclaim job, we’ll enter a job name & select ‘Delete old snapshots.’ We have an opportunity to add a description and specify various snapshot details, like only deleting snapshots older than 7 days, filtering by size or matching a specific snapshot name.

For step 2, we’ll define a scope, selecting specific objects that contain the VMs we want this automation to target. In the screenshot below, we’ve picked all of one vCenter, another Datacenter, and a specific cluster from a 3rd vCenter. This allows us to create different job scopes for different types of environments.

In step 3, we can define additional filter criteria. This is incredibly flexible. In the example below I’ve specified 3 different criteria combined with ‘and’ logic.

  • Tag ‘SnapshotPolicy’ not exists = means that there is no tag assigned to this VM with the category SnapshotPolicy. This would allow me to assign this tag category to some VMs with tags like ‘1 month’ or ‘manual’ and have separate jobs for them. This ‘not exists’ job would get all other VMs.
  • Metrics CPU|Usage (%) is less than 50% = would allow me to exclude VMs that are busy doing something.
  • Properties Configuration|Number of VMDKs is less than 5 = excludes VMs that have a lot of VMDKs.

We can add additional criteria on other metrics, properties, tags, object names, etc as needed.

On the final step 4 we can schedule how often this task runs. In my example this job is only running on Saturdays for the next year, and it will send a email updates as needed.

Conclusion

VCF Operations Automation Central is a very powerful tool and can be used to automate routine tasks such as snapshot removal. If you’re not yet using Automation Central, i’s worth exploring to streamline operations and reduce manual effort.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Monitoring a Raspberry Pi with Telegraf and Aria Operations

I recently set out to configure the open-source Telegraf agent on a physical system in my lab, with the goal of sending telemetry data to Aria Operations. The process for setting this up is documented here: https://techdocs.broadcom.com/us/en/vmware-cis/aria/aria-operations/8-18/vmware-aria-operations-configuration-guide-8-18/connect-to-data-sources/monitoring-applications-and-os-using-open-source-telegraf/monitoring-applications-using-open-source-telegraf/monitoring-applications-using-open-source-telegraf-on-a-linux-platform-saas-onprem.html. Since most of my lab systems are virtualized, the only physical candidate available was a Raspberry Pi running Ubuntu 24.04, and with its ARM-based CPU, I wasn’t sure if it would be supported.

Installing Telegraf on ARM (Ubuntu 24.04)

The first step was to install telegraf from the appropriate repository.

sudo curl -fsSL https://repos.influxdata.com/influxdata-archive_compat.key -o /etc/apt/keyrings/influxdata-archive_compat.key
echo "deb [signed-by=/etc/apt/keyrings/influxdata-archive_compat.key] https://repos.influxdata.com/ubuntu stable main" | sudo tee /etc/apt/sources.list.d/influxdata.list
sudo apt update
sudo apt -y install telegraf

I then needed to download the utility script to help configure telegraf to send to Aria Operations.

wget --no-check-certificate https://cm-opscp-01.lab.enterpriseadmins.org/downloads/salt/telegraf-utils.sh
chmod +x telegraf-utils.sh

The telegraf-utils.sh script requires an auth token. I accessed the Swager UI at https://ops.example.com/suite-api and used the /auth/token/acquire endpoint to generate the token. Here is the body I submitted.

{
  "username" : "svc-physvr",
  "password" : "VMware1!"
}

In this case, the svc-physvr is a user account created in the Aria Operations UI which maps to a limited access user account. The response body included the necessary token value, which I used when invoking the helper script:

sudo ./telegraf-utils.sh opensource -c 192.168.45.73 -t 24c884f0-2558-40fa-9626-61f577487ea5::7d209766-11f2-456a-a2d9-2a40b4459920 -v 192.168.45.73 -d /etc/telegraf/telegraf.d -e /usr/bin/telegraf

The parameters used in this script are explained in the product documentation.

Finally, I restarted the telegraf service.

sudo systemctl restart telegraf

Unfortunately that was met with an error.

Job for telegraf.service failed because the control process exited with error code.
See "systemctl status telegraf.service" and "journalctl -xeu telegraf.service" for details.

Looking at the logs, we could see that a certificate could not be read

sudo journalctl --no-pager -u telegraf

[...]
Jun  12 19:53:09 rpi-extdns-01 telegraf[1292]: 2025-06-12T18:53:09Z E! loading config file /etc/telegraf/telegraf.d/cloudproxy-http.conf failed: error parsing http array, could not load certificate "/etc/telegraf/telegraf.d/cert.pem": open /etc/telegraf/telegraf.d/cert.pem: permission denied
Jun  12 19:53:09 rpi-extdns-01 systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
[...]

I checked permissions on the cert.pem file and confirmed it was owned by root for user and group. The same was true for the key.pem file. I adjusted permissions for both files and tried again:

sudo chown telegraf:telegraf /etc/telegraf/telegraf.d/cert.pem
sudo chown telegraf:telegraf /etc/telegraf/telegraf.d/key.pem
sudo systemctl restart telegraf

This time no errors occurred. In short, the Telegraf service was unable to read its TLS certificate files because they were owned by root, but the service runs as the telegraf user. Fixing ownership resolved the issue.

Validating Success in Aria Operations

After waiting some time, I could see data in Aria Operations for this physical server. I first searched for the VM name and found an object called “Linux OS on rpi-extdns-01” (the servers hostname).

Clicking on that object allowed me to view metrics/properties which were collected. For example, the below screenshot shows the disk used over time for the root file system.

More details on this system could be found in the dashboard “Linux OS discovered by Telegraf.”

Conclusion

It’s great to have full visibility into this physical server using the same Aria Operations dashboards and alerts I already rely on for virtual systems. The setup was straightforward, and with a few tweaks for file permissions, the integration worked well even on a low-cost Raspberry Pi with an ARM processor.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Cleaning Up Orphaned Tag Associations in vCenter

I was recently made aware of a KB article titled “Tag associations are not removed from vCenter Server database when associated objects are removed or deleted” (https://knowledge.broadcom.com/external/article?articleNumber=344960). The article includes a script that removes orphaned tag assignments left behind in the vCenter Server database after object deletion.

Investigating the Issue

After reviewing this article, I checked the vpxd.log file on one of my lab vCenter Server instances and noticed frequent entries like the following:

2025-06-07T17:33:50.765Z error vpxd[06442] [Originator@6876 sub=Authorize opID=4be68587-f898-41b0-bbd4-2764f0941eaa Authz-7c] MoRef: vim.Datastore:datastore-4936 not found. Error: N5Vmomi5Fault21ManagedObjectNotFound9ExceptionE(Fault cause: vmodl.fault.ManagedObjectNotFound

To quantify this, I ran:

cat /var/log/vmware/vpxd/vpxd.log | grep -i vmodl.fault.ManagedObjectNotFound | wc -l
13738

cat /var/log/vmware/vpxd/vpxd.log | wc -l
210258

This showed that roughly 6.5% of the log entries were related to this specific fault, which strongly suggested lingering tag associations.

Reproducing the Issue

To test further, I moved to a clean vCenter environment with no history of tag usage. I created and tagged 10 virtual machines:

$newCat = New-TagCategory -Name 'h378-category' -Cardinality:Multiple -EntityType:VirtualMachine

0..9 |%{ New-Tag -Name "h378-tag$_" -Category $newCat }

new-vm -VMHost test-vesx-71* -Name "h378-vm0" -Datastore vc3-test03-sdrs -Template template-tinycore-160-cli-cc
1..9 | %{ new-vm -VMHost test-vesx-71* -Name "h378-vm$_" -Datastore vc3-test03-sdrs -Template template-tinycore-160-cli-cc }

New-TagAssignment -Tag (Get-Tag "h378*") -Entity (Get-VM "h378*")

Get-VM "h378*" | Remove-VM -DeletePermanently:$true -Confirm:$false

After deletion, there were no log entries related to orphaned tags. I queried the database using a modified version of the cleanup script in read-only mode and confirmed that no orphaned tag rows existed. This led me to revisit the KB and note that:

In vSphere 7 and 8, tag associations are automatically removed for Virtual Machines and Hosts when the associated object is deleted.

Confirming with Cluster Objects

I then repeated the test using cluster objects, which are not automatically cleaned up:

$newCat = New-TagCategory -Name 'h378-category-Cluster' -Cardinality:Multiple -EntityType:ClusterComputeResource
0..9 |%{ New-Tag -Name "h378-cluster-tag$_" -Category $newCat }

0..9 |%{ New-Cluster -Name "h378-cluster-$_" -Location (Get-Datacenter h378-test) }

New-TagAssignment -Tag (Get-Tag "h378-cluster*") -Entity (Get-Cluster "h378*")

get-cluster "h378*" | remove-cluster -Confirm:$false

Shortly after deletion, the vpxd.log showed ManagedObjectNotFound errors. I verified the orphaned rows using the following SQL query:

${VMWARE_POSTGRES_BIN}/psql -U postgres VCDB -h /var/run/vpostgres <<EOF
select * from cis_kv_keyvalue where kv_provider like 'tagging:%'
and
kv_key like 'tag_association urn:vmomi:ClusterComputeResource:%'
and
regexp_replace(kv_key, 'tag_association urn:vmomi:ClusterComputeResource:domain-c([0-9]+).*', '\1')::bigint
not in (select id from vpx_entity where type_id=3);
EOF

This confirmed 100 orphaned tag associations, which I then cleaned up using the provided tags_delete_job_all.sh script.

Returning to the initial vCenter Server with ~6% of vpxd.log entries coming from this issue, I proceeded to create a snapshot and run the same script. It only removed about 30 orphaned associations. However, now I’m not seeing the new vmodl.fault.ManagedObjectNotFound entries showing up every few seconds.

Cleanup Results

Back on the original vCenter Server where the log showed high volumes of these errors, I took a snapshot and ran the cleanup script. It only removed around 30 entries, but new ManagedObjectNotFound messages have stopped appearing.

This reduction is easy to monitor in Aria Operations for Logs, especially across multiple vCenter environments.

Conclusion

In my environments, VM and Host deletions are the most common, and these objects now clean up their tag associations automatically in recent vSphere versions. However, orphaned associations from cluster or other object types may remain, especially in environments upgraded over time.

By reviewing your vpxd.log and using the methods shown here, you can identify and remediate these issues efficiently.

Posted in Scripting, Virtualization | Leave a comment

Centralized Startup Scripting for Automated VM Load Testing

When building or troubleshooting infrastructure, it is often useful to simulate high CPU or memory usage withou deploying full production workloads. To assist with this, I previously created a few purpose built, like cpubusy.sh and memfill.sh.

Historically, I created multiple Tiny Core Linux templates, each designed for a specific purpose, a generic one for troubleshooting, one that would fill mem and a another to load up the CPU. I’d then place these scripts in /opt/bootlocal.sh, allowing each VM to run its designated script automatically at startup. I’d then control load by simply powering VMs on or off.

The Problem

That setup works fine for simple use cases, but it doesn’t scale well. What if I want:

  • One VM to simulate CPU load,
  • Another to test download speeds,
  • A third to run a custom test script—all using the same base image?

The Common Control Script

This script runs at boot and checks for specific instructions to execute. The idea is simple: deploy a single generic VM template that decides what to do based on either:

  • Metadata (via guestinfo)
  • Network identity (IP, MAC, hostname)
  • Shared config (via GuestStore or a web server)

This common control script can be found here: code-snips/cc.sh.

Where It Looks for Instructions

When a VM boots, the script checks for commands in the following order:

  1. Web server:
    • http://<web>/<macaddress>.txt
    • http://<web>/<ipaddress>.txt
    • http://<web>/<hostname>.txt
    • http://<web>/all.txt
  2. VM Guest Store (using VMware tools):
    • guestinfo.ccScript (specified via advanced VM setting)
    • /custom/cc/cc-all.txt

This layered approach gives flexibility:

  • Set a global script via all.txt
  • Override per host via metadata or identifiers
  • Or push custom scripts directly via GuestInfo

Example: Setting the GuestInfo Property

Using PowerCLI, we can set the script filename per VM like this:

Get-VM h045-tc16-02 | New-AdvancedSetting -Name 'guestinfo.ccScript' -Value 'memfill.sh' -confirm:$false

We can also modify this via the vSphere Web Client.

Demonstration

Here’s a test case from my lab:

  • I set guestinfo.ccScript to memfill.sh
  • The all.txt file includes a simple command to print system time

Upon boot, the VM fills 90% of available RAM using a memory-backed filesystem and prints the time, confirming that both script sources are active.

Later, I removed the guestinfo.ccScript setting and added a <hostname>.txt script to download a file repeatedly from a test web server. After reboot, the VM behaved differently, now acting as a network test client. No changes to the template required.

Sample Scripts

Here are a few lightweight test scripts used in the demo:

  • cpubusy.sh – uses sha1sum to keep all the configured CPU cores busy
  • download.sh – uses wget to get the same webserver file x times and save it to /dev/null
  • memfill.sh – creates a memory backed filesystem using 90% of RAM, then uses dd to fill it

Conclusion

This ‘common config’ approach provides template reuse, easier script management, and dynamic testing control, all without modifying the template.

Whether testing CPU, memory, or network load across dozens of VMs, the common control script simplifies the process and reduces maintenance overhead.

In future iterations, this setup could be extended to include conditional logic (based on boot time, VM tags, or other metadata), or integration with CI pipelines for even more powerful automation.

Posted in Scripting, Virtualization | Leave a comment