Finding the real NFS network

I was recently helping a customer who had inherited an existing vSphere deployment that used NFS storage. They were tasked with migrating the old VMs to newer infrastructure, but they first wanted to find the NFS storage array backing the datastores.

Looking at the datastores in vCenter, they couldn’t find the hostname/IP address of the storage target. Instead, they saw somewhat random values in the device backing > server field, sort of like this demo red datastore I mocked up that shows a server of network.nfs.1 where we’d normally expect to see the hostname or IP. The value observed, network.nfs.1 in this case, wasn’t a name that was resolvable using the customers DNS.

Looking at host networking, one VMkernel adapter was clearly the one used for storage access, similar to this mocked up screenshot:

It seemed logical that network.nfs.1 and the other seemingly random names were devices on this 10.3.3.0/25 network. We wanted to try and issue a ping from this ESXi host, but the root password was unknown and we were not able to login to the console to do so. However, since we had access to vCenter, I went looking for a way to send a ping from esxcli, hoping we could then use the Get-EsxCli PowerCLI cmdlet to issue our pings. I found esxcli network diag ping and testing in a lab worked as expected, so we tried it in this environment:

$esxcli = Get-EsxCli -VMHost $thisVmHost -v2
$networkDiagPing = $esxcli.network.diag.ping.CreateArgs()
$networkDiagPing.host = 'network.nfs.1'
$networkDiagPing.interface = 'vmk1'
$pingResults = $esxcli.network.diag.ping.Invoke($networkDiagPing)

Unfortunately, this resulted in the error sendto() failed (Network is unreachable). Surprising, as the NFS datastore was online and we had specified that we wanted to use the VMkernel interface on the storage network. In this case, the host had 4 VMkernel interfaces, so we stepped through each, trying to find out if the storage traffic was using a different interface. The last interface we tried, vmk0, received a response.

As best we could tell, the vmk1 interface was unused. The portgroup named ‘storage’ had a VLAN backing that didn’t actually exist in the environment & the VMkernel IP address wasn’t a network that existed either. Once we knew which network adapter was actually in use, the ping response returned an IP address of a known NAS. We did a bit more digging and found host entries that were obfuscating the actual IP addresses of known storage targets. For reference, here is how we found the host file entries, again using esxcli.

$esxcli.network.ip.hosts.list.invoke() | Select-Object HostName, IPaddress

HostName      IPaddress
--------      ---------
network.nfs.2 192.168.10.26
network.nfs.1 192.168.10.26
network.nfs.9 192.168.67.21

After the fact I put together a quick script to help in the odd event I ever see something like this again. It finds all the unique hostnames/IPs used by NFS datastores and then for each VMkernel interface attempts to ping the NFS host, only showing the successful ping responses.

$thisVmHost = 'h197-vesx-04.lab.enterpriseadmins.org'
foreach ($thisDatastoreBacking in Get-vmhost $thisVmHost | get-datastore |?{$_.ExtensionData.info.nas.type -eq 'NFS'} | select-object @{N='RemoteHostNames';E={$_.ExtensionData.info.nas.RemoteHostNames}} -Unique) {
  foreach ($thisVmk in Get-VMHostNetworkAdapter -VMHost $thisVmHost -VMKernel) {
    $esxcli = Get-EsxCli -VMHost $thisVmHost -v2
    $networkDiagPing = $esxcli.network.diag.ping.CreateArgs()
    $networkDiagPing.host = $thisDatastoreBacking.RemoteHostNames
    $networkDiagPing.interface = $thisVmk.name
    try {$pingResults = $esxcli.network.diag.ping.Invoke($networkDiagPing); $uniqueHosts = [string]::Join(', ', ($pingResults.Trace.host | select-object -Unique))} catch { $pingResults=$null }
    
    if ($pingResults) { "Pinging $($thisDatastoreBacking.RemoteHostNames) from $($thisVmk.name) [IP $($thisVmk.IP)] took path $uniqueHosts" }
  } # end vmkernel loop
} # end Datastore backing loop

In a lab with a similar configuration, the script above produces output similar to:

Pinging network.nfs.1 from vmk0 [IP 192.168.10.19] took path 192.168.10.26
Pinging network.nfs.2 from vmk0 [IP 192.168.10.19] took path 192.168.10.26
Pinging network.nfs.9 from vmk0 [IP 192.168.10.19] took path 192.168.57.21, 192.168.10.1, 192.168.127.252

The final row in that output shows an NFS target that was not on the local network and took a few hops to get to the final destination, which might be helpful to see.

Posted in Scripting, Virtualization | Leave a comment

Aria Operations Self Health dashboard visibility

In Aria Operations there are several dashboards that are enabled by default to help ensure the Aria Operations environment is healthy. These dashboards are available under Visualize > Dashboards in a folder named VMware Aria Operations. The specific dashboards have names like:

  • Self Cluster Health
  • Self Health
  • Self Perfomance Details
  • Self Services Communications
  • Self Services Summary
  • Self Troubleshooting
  • vCenter Adapter Details

Recently I was working with a colleague who couldn’t see these dashboards. According to the documentation (https://docs.vmware.com/en/VMware-Aria-Operations/8.12/Best-Practices-Operations/GUID-8D7D3B14-6A4D-4895-B583-18753F03E48D.html) these dashboards should be activated by default. This led us to checking out permissions on these dashboards, where we found that by default they are not shared. Typically, built-in dashboards are shared with everyone, which can be seen by clicking the “share “Share Dashboard” icon in the top right, selecting the “Groups” tab, and reviewing the ‘Dashboard shard with” text at the bottom of the popup (shown below):

Example of a builtin dashboard shared with Everyone.

The built-in self health dashboards were not shared with anyone by default; only the builtin admin account has access. This seems reasonable, you likely don’t need/want everyone troubleshooting your Aria Operations cluster. However, for those tasked with maintaining Aria Operations deployments, these dashboards are very useful. If we are logged in as the admin user who owns these dashboards, from the above share dashboards screen we can select our custom administrators group and click “include” to start sharing the relevant dashboards. The next time members of the admins group login they should see these self health dashboards in the VMware Aria Operations folder.

Posted in Lab Infrastructure, Virtualization | Leave a comment

MongoDB: Test data for performance monitoring

This post will cover loading some test data into our MongoDB instance and generating some queries for performance monitoring. In previous posts we covered creating a MongoDB replica set (here) and configuring the Aria Operations Management Pack for MongoDB (here).

Reviewing the MongoDB website, there is a good article about some sample datasets: https://www.mongodb.com/developer/products/atlas/atlas-sample-datasets/. The MongoDB post covers importing the data using Atlas, then describes each data set. At the very end of the article, they cover importing this data with the mongorestore command line utility. As we do not have a GUI available with this Mongo instance, this is what we’ll do in this post.

The first step is to SSH into the primary node of our MongoDB replicaset. We can find this value on the MongoDB Replica Set Details dashboard in Aria Operations (its in the MongoDB Replica Sets widget at the top right in the column ‘Primary Replication’) or by using the rs.status() command in Mongo Shell discussed earlier in this series.

From the /tmp directory, we’ll download the sampledata archive using the command line utility curl like below:

curl https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive

The download will be about 372MB. Once we have the file, we will use the command line mongorestore command with the following syntax:

mongorestore --archive=sampledata.archive -u root -p 'password'

We can get the root password from the console of the first VM in our cluster, the one where we ran the rs.initiate earlier. The restore should complete rather quickly. Progress is written to the screen during the restore, but the final line in my output was:

2024-05-11T18:21:40.888+0000    425367 document(s) restored successfully. 0 document(s) failed to restore.

A couple hundred thousand records should be enough to work with for our needs — where we primarily want to make sure our monitoring dashboard is working.

Having data in our database isn’t really enough, we do need to have some queries running as well. I’m sure there are more complete/better load generating tools (such as YCSB), but after a quick search I found a couple of PowerShell examples for connecting to MongoDB (https://stackoverflow.com/questions/45010964/how-to-connect-mongodb-with-powershell). One is a module available in the PowerShell Gallery. It was easy to install with Install-Module Mdbc, so I gave this a shot. One of the first issues I encountered was with the default root password I was using. It had a colon in it, which is the character used to separate username:password in the connection string. I found a quick way to escape the special characters and a little more trial and error was able to create a connection string. One thing I ran into was the default readPreference assumed that all reads should come from the primary node, so neither of my secondary nodes were really doing anything. I ended up using the ‘secondaryPreferred’ method, so that I could see load on multiple nodes in the cluster.

$mongoPass = [uri]::EscapeDataString('wj:dFDgb6tom')
$mongoConnectString = "mongodb://root:$mongoPass@svcs-mongo-01.lab.enterpriseadmins.org,svcs-mongo-02.lab.enterpriseadmins.org,svcs-mongo-03.lab.enterpriseadmins.org/?readPreference=secondaryPreferred"

With the password escaped and the connection string built, it is easy to connect to the database. For example, to return a list of databases/collections from the mongo instance, I can run the following command:

Connect-Mdbc $mongoConnectString *

# List returned:
admin
config
local
sample_airbnb
sample_analytics
sample_geospatial
sample_guides
sample_mflix
sample_restaurants
sample_supplies
sample_training
sample_weatherdata

Running Connect-Mdbc $mongoConnectString sample_analytics * (adding a specific database name to the command) will return the three tables listed in the database. A few quick foreach loops later, we have a query that’ll run for a fairly long time, and we could easily make the loop have more iterations. It gives you some basic output to watch so you know it is working, and CTRL+C will let you exit the loop at any point.

$randomCounts = 2
1..1000 | %{
  $myResults = @()
  foreach ($thisDB in (Connect-Mdbc $mongoConnectString * |?{$_ -match 'sample'} | Get-Random -Count $randomCounts)) {
    foreach ($thisTable in (Connect-Mdbc $mongoConnectString $thisDB * | Get-Random -Count $randomCounts)) {
      Connect-Mdbc $mongoConnectString $thisDB $thisTable | Get-Random -Count $randomCounts
      $myResults += [pscustomobject][ordered]@{
        "Database" = $thisDB
        "Table"    = $thisTable
        "RowCount" = (Get-MdbcData -as PS | Measure-Object).Count
      } # end outputobject
    } # end table loop
  } # end db loop
  $rowsReturned = ($myResults | Measure-Object -Property rowcount -sum).Sum
  "Completed iteration $_ and returned $rowsReturned rows"
} # end counter loop

While running the above loop, I also went through and messed with cluster nodes, rebooting them to see what happens and see if queries failed. The cluster was more resilient than I had expected. This worked well to generate some CPU load on my Mongo VMs to populate an Aria Operations dashboard.

Posted in Lab Infrastructure, Virtualization | Leave a comment

Optimizing Operations: Aria Operations Management Pack for MongoDB

In a previous post (here), I covered how to setup a MongoDB replica set to be monitored by Aria Operations in a vSphere based lab. This article will cover the installation & configuration of the Aria Operations Management Pack for MongoDB. The article will be divided into three sections:

  • Installing the Management Pack – which will cover the Aria Suite Lifecycle Marketplace workflow
  • Configuring a service account – this will be the monitoring user for the MongoDB instances, configured inside Mongo Shell.
  • Adding a MongoDB adapter instance in Aria Operations – configuring Aria Operations to use the new management pack and service account.

Installing the Management Pack

In Aria Suite Lifecycle (formerly known as vRealize Suite Lifecycle Manager), I typically stay in the Lifecycle Operations, Locker, or Content Management tiles. In this post we’ll use the Marketplace tile to add a management pack for MongoDB. For details on creating a MongoDB Replica Set cluster, feel free to check out this previous post.

At the top of the Marketplace screen, there is a search box. When I search for Mongo I get three results. Hovering over each, there is one for Aria Operations Management Pack for MongoDB Version 9.0:

After selecting DOWNLOAD in the bottom right of the tile, we are presented the EULA. After reading the document, we check the box and select next. On the last page of this wizard, we enter our name, email address, company name, and country, then click DOWNLOAD.

This will show a Success. Download for VMware Aria Operations Management Pack for MongoDB 9.0 is initiated. For more details visit request page. The request text is a link to show the specific request. It should contain just one stage and complete fairly quick. Here is a screenshot of the expected results of the download task:

Back in the Marketplace, we switch to the ‘Available’ tab, which shows the subset of management packs that we’ve already downloaded. Due to the length of management pack names, it may be helpful to search for mongo in the search bar.

When we select ‘View Details’ a page will appear that provides some details about the selected management pack. If we scroll to the end of the page, there are buttons for INSTALL and DELETE. Since we are installing this for the first time, we’ll select INSTALL.

We must then pick our Datacenter and Environment and finally select the INSTALL button in the lower right.

This will result in a banner stating Installation in progress. Check request status. Request Status is again a link that takes us to the ‘Requests’ page. Again this should be a single stage task, but it’ll likely take longer than the download.

Once this task completes, our management pack will be available in the Aria Operations instance. We’ll just need to give it hostnames and credentials to begin collecting data.

Configuring a service account

From an SSH session to our primary MongoDB node, we’ll open a Mongo Shell with the following command:

mongosh --username root

We’ll enter our root password when prompted. One important thing to note, is that the root password for MongoDB is the root password of the node where we initiated the replica set.

// specify that we want to use the admin database instead of the default test
use admin

// Create our limited access service account for monitoring.  
db.createUser(
   {
     user: "svc-ariaopsmp",
     pwd: "VMware1!",
     roles: [ { role: "clusterMonitor", db: "admin" } ]
   }
)

// Confirm that it worked
db.getUser('svc-ariaopsmp')

When the above command completes, the final JSON object presented to the screen should look like this:

{
  _id: 'admin.svc-ariaopsmp',
  userId: UUID('2ee886ee-f700-46e7-b022-950fbb36034f'),
  user: 'svc-ariaopsmp',
  db: 'admin',
  roles: [ { role: 'clusterMonitor', db: 'admin' } ],
  mechanisms: [ 'SCRAM-SHA-1', 'SCRAM-SHA-256' ]
}

Which shows that we have a username ‘svc-ariaopsmp’ that has the ‘clusterMonitor’ role, which is a limited permission set for monitoring operations, just like what we are doing with Aria Operations.

Adding a MongoDB adapter instance in Aria Operations

From Aria Operations, we’ll navigate to the Data Sources > Integrations > Repository and verify we can see our new installed integration. If we click the tile, details come up about the management pack, including the metrics collected and content that is now available. We can select ‘ADD ACCOUNT’ from the top of the screen to create an adapter instance.

When adding an account there are only a few fields required:

The ‘Name’ field is used in some of the dashboards to identify which MongoDB instance that is being described. It is important to include a short descriptive name, in my case I’m going to call this ‘svcs-mongo-replicaset’ to denote that it is deployed in my services network, is running mongo, and is a replica set cluster.

The ‘Description’ field is not shown by default on the MongoDB dashboards added by th management back, but can contain additional details about the adapter instance. If you have various MongoDB instances managed by different teams, this might be a good place to store the contact, for example if your credentials expire or stop working.

The ‘Host’ field is where we enter the hostname of our MongoDB instance. Since we’ve configured a replica set, I’m listing a comma separated list of all the nodes of the MongoDB cluster. This will help ensure we are getting monitoring data even if one node fails.

The Credential allows us to specify whether or not authentication is required and enter username/password details. In my case, I do require username/password, and am only running mongod, not mongos, but I’m going to enter the same account in both fields in case something changes in the future. Here is what my new credential looks like:

Finally, I ‘VALIDATE CONNECTION’ and wait for the ‘Test connection successful’ notice to appear, then click ‘ADD’ at the bottom of the screen.

Additional advanced settings are available, like which service to connect to, specifics around SSL, timeouts, and autodiscovery. Those settings did not need to be tweaked for this simple environment.

After 5-10 minutes, we should have some initial data. Browsing to Visualize > Dashboards we can filter the list of dashboards for Mongo. There are 7 dashboards available from the management pack we installed related to MongoDB. Not all of them will have data, for example we don’t have Mongos deployed. However, if we select MongoDB Replica Set Details we should see some information about our environment.

To make the most of this, we’ll really need some sample data and a way to run queries as needed to have something to monitor. We’ll cover that in the next post (here).

Posted in Lab Infrastructure, Virtualization | 1 Comment

Creating a MongoDB Replica Set

I was recently looking at the Aria Operations Management Pack for MongoDB (https://docs.vmware.com/en/VMware-Aria-Operations-for-Integrations/9.0/Management-Pack-for-MongoDB/GUID-73744E17-88DD-49A1-8B86-5BD896C874D8.html) and wanted to kick the tires in my vSphere-based lab. To be able to test this management pack, I wanted to deploy a three node MongoDB replica set.

To begin, I found a solid starting point, the Bitnami MongoDB appliance: https://bitnami.com/stack/mongodb/virtual-machine. I downloaded this appliance which included MongoDB 7.0.9 pre-installed.

I deployed three copies of the appliance, specifying static IPs during the deployment. I pre-created forward and reverse DNS records for these IPs. The VMs come configured with 1vCPU and 1GB of RAM. When doing a replica set, I ran into a few issues with only 1GB of RAM, having 2GB of RAM seemed better. Once the VMs were powered on, I logged in with the bitnami/bitnami username/password combination and changed the bitnami password. I then made a few changes I made in the OS using the vSphere web console.

sudo nano /etc/ssh/sshd_config
# find PasswordAuthentication no and change to yes

sudo rm /etc/ssh/sshd_not_to_be_run
sudo systemctl start sshd

This allowed me to login over SSH to make the remaining changes. Over SSH I made some additional changes to allow MongoDB to be accessed remotely:

sudo nano /etc/nftables.conf  # find and add "tcp dport { 27017-27019 } accept" in the section below accepting tcp22
sudo nft -f /etc/nftables.conf -e

I also wanted to clean up a couple of general networking issues, namely setting the hostname of the VM and removing an extra secondary IP address that might get pulled via DHCP.

# set the OS hostname
echo mongo-xx.lab.example.com | sudo tee /etc/hostname

# disable the DHCP interface if using static addresses
sudo nano /etc/network/interfaces
# find the line at the end of the file for 'iface ens192 inet dhcp' and remove it or make it a comment.

# confirm DNS resolution is configured
sudo nano /etc/resolv.conf  # review for nameserver entries

sudo systemctl restart networking

Next up, we will make some changes to the MongoDB to support our replica set. We need a security key for nodes to talk with each other. Instead of using a short password, from my local machine running PowerShell, I created a new GUID to use as a security key for node communication and removed the hyphens.

[guid]::NewGuid() -Replace '-', ''

For me, this created the string 88157a33a9dc499ea6b05c504daa36f8 which I’ll reference in the document. There are other ways to create longer/more complex keys, but this was something quick. We need to put this key in a file that will be referenced in a mongo config file. To create the security key we’ll use the following command:

echo '88157a33-a9dc-499e-a6b0-5c504daa36f8' | sudo tee /opt/bitnami/mongodb/conf/security.key
sudo chmod 400 /opt/bitnami/mongodb/conf/security.key
sudo chown mongo:mongo /opt/bitnami/mongodb/conf/security.key

Now that we have that security key, we’ll update our mongo config.

sudo nano /opt/bitnami/mongodb/conf/mongodb.conf

# Find and remove the comments for replication and replSetName, and set a replication set name if desired.

# uncomment #keyFile at the end of the file and set value to /opt/bitnami/mongodb/conf/security.key

# save the file, restart services
sudo systemctl restart bitnami.mongodb.service

Now that the cluster nodes are all prepared, we can turn this into a replica set cluster. Since I’m not familiar with this process, I decided to create snapshots of all the nodes (PowerCLI makes this easy — get-vm mongo-0* | New-Snapshot -Name 'Pre RS config') and reboot them for good measure. I’ll log back into the first node of the cluster over SSH and enter the mongo shell interface, using the root password found on the virtual appliance console.

mongosh --username root
# enter the root password

# initiate the cluster
rs.initiate( {
   _id : "replicaset",
   members: [
      { _id: 0, host: "mongo-01.lab.example.com:27017" },
      { _id: 1, host: "mongo-02.lab.example.com:27017" },
      { _id: 2, host: "mongo-03.lab.example.com:27017" }
   ]
})

# check the status to ensure cluster is working
rs.status()

The rs.status() command should return the status of the cluster. We should see one of our nodes is the primary and the other two are secondary (based on the stateStr property).

With our replica set working, we can now start monitoring. The next post (link) will cover installing and configuring the management pack. The final post in this series (link) will show how to import some sample data and run sample queries to put read load on the replica set.

Posted in Lab Infrastructure, Scripting, Virtualization | 2 Comments