MongoDB: Test data for performance monitoring

This post will cover loading some test data into our MongoDB instance and generating some queries for performance monitoring. In previous posts we covered creating a MongoDB replica set (here) and configuring the Aria Operations Management Pack for MongoDB (here).

Reviewing the MongoDB website, there is a good article about some sample datasets: https://www.mongodb.com/developer/products/atlas/atlas-sample-datasets/. The MongoDB post covers importing the data using Atlas, then describes each data set. At the very end of the article, they cover importing this data with the mongorestore command line utility. As we do not have a GUI available with this Mongo instance, this is what we’ll do in this post.

The first step is to SSH into the primary node of our MongoDB replicaset. We can find this value on the MongoDB Replica Set Details dashboard in Aria Operations (its in the MongoDB Replica Sets widget at the top right in the column ‘Primary Replication’) or by using the rs.status() command in Mongo Shell discussed earlier in this series.

From the /tmp directory, we’ll download the sampledata archive using the command line utility curl like below:

curl https://atlas-education.s3.amazonaws.com/sampledata.archive -o sampledata.archive

The download will be about 372MB. Once we have the file, we will use the command line mongorestore command with the following syntax:

mongorestore --archive=sampledata.archive -u root -p 'password'

We can get the root password from the console of the first VM in our cluster, the one where we ran the rs.initiate earlier. The restore should complete rather quickly. Progress is written to the screen during the restore, but the final line in my output was:

2024-05-11T18:21:40.888+0000    425367 document(s) restored successfully. 0 document(s) failed to restore.

A couple hundred thousand records should be enough to work with for our needs — where we primarily want to make sure our monitoring dashboard is working.

Having data in our database isn’t really enough, we do need to have some queries running as well. I’m sure there are more complete/better load generating tools (such as YCSB), but after a quick search I found a couple of PowerShell examples for connecting to MongoDB (https://stackoverflow.com/questions/45010964/how-to-connect-mongodb-with-powershell). One is a module available in the PowerShell Gallery. It was easy to install with Install-Module Mdbc, so I gave this a shot. One of the first issues I encountered was with the default root password I was using. It had a colon in it, which is the character used to separate username:password in the connection string. I found a quick way to escape the special characters and a little more trial and error was able to create a connection string. One thing I ran into was the default readPreference assumed that all reads should come from the primary node, so neither of my secondary nodes were really doing anything. I ended up using the ‘secondaryPreferred’ method, so that I could see load on multiple nodes in the cluster.

$mongoPass = [uri]::EscapeDataString('wj:dFDgb6tom')
$mongoConnectString = "mongodb://root:$mongoPass@svcs-mongo-01.lab.enterpriseadmins.org,svcs-mongo-02.lab.enterpriseadmins.org,svcs-mongo-03.lab.enterpriseadmins.org/?readPreference=secondaryPreferred"

With the password escaped and the connection string built, it is easy to connect to the database. For example, to return a list of databases/collections from the mongo instance, I can run the following command:

Connect-Mdbc $mongoConnectString *

# List returned:
admin
config
local
sample_airbnb
sample_analytics
sample_geospatial
sample_guides
sample_mflix
sample_restaurants
sample_supplies
sample_training
sample_weatherdata

Running Connect-Mdbc $mongoConnectString sample_analytics * (adding a specific database name to the command) will return the three tables listed in the database. A few quick foreach loops later, we have a query that’ll run for a fairly long time, and we could easily make the loop have more iterations. It gives you some basic output to watch so you know it is working, and CTRL+C will let you exit the loop at any point.

$randomCounts = 2
1..1000 | %{
  $myResults = @()
  foreach ($thisDB in (Connect-Mdbc $mongoConnectString * |?{$_ -match 'sample'} | Get-Random -Count $randomCounts)) {
    foreach ($thisTable in (Connect-Mdbc $mongoConnectString $thisDB * | Get-Random -Count $randomCounts)) {
      Connect-Mdbc $mongoConnectString $thisDB $thisTable | Get-Random -Count $randomCounts
      $myResults += [pscustomobject][ordered]@{
        "Database" = $thisDB
        "Table"    = $thisTable
        "RowCount" = (Get-MdbcData -as PS | Measure-Object).Count
      } # end outputobject
    } # end table loop
  } # end db loop
  $rowsReturned = ($myResults | Measure-Object -Property rowcount -sum).Sum
  "Completed iteration $_ and returned $rowsReturned rows"
} # end counter loop

While running the above loop, I also went through and messed with cluster nodes, rebooting them to see what happens and see if queries failed. The cluster was more resilient than I had expected. This worked well to generate some CPU load on my Mongo VMs to populate an Aria Operations dashboard.

This entry was posted in Lab Infrastructure, Virtualization. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.