I was recently looking at the VMware Event Broker Appliance fling (https://vmweventbroker.io/). This fling enables a custom function to execute when a vSphere event happens, sort of an ‘if this then that’ approach to vSphere events (https://octo.vmware.com/vsphere-power-event-driven-automation/). As I was reading through the documentation, I began to wonder if I could build my own version of this using components already deployed to the lab. I already have Aria Operations for Logs which gets all my vSphere events via syslog. I have Jenkins configured with the generic webhook plugin. Aria Operations for Logs alert definitions can notify a webhook. These building blocks should do the trick… some assembly required*.
For demonstration purposes, I wanted to configure an alert to happen when a vSphere virtual machine is deleted. An Aria Operations for Logs alert will call Jenkins and some custom code will make sure the computer object in Active Directory is removed and the Aria Operations Config minion key is deleted.
Aria Operations for Logs – webhook payload
In Aria Operations for Logs we can define a custom payload for a webhook. This is defined under Configuration > Webhooks. I’ve used the following settings:
- Endpoint Type: Custom
- Log Payload: Individual Logs
- Webhook URL: http://jenkins.example.com/generic-webhook-trigger/invoke?token=vmDeprovisionRequest
- Action: POST
- Content Type: JSON
- Webhook body:
${messages}
This will cause Aria Operations for Logs to call my webhook URL for each individual message observed. It will post the captured event as JSON to Jenkins and pass in my jobs token.
Aria Operations for Logs – alert definition
In the Alerts > Alert Definitions section, I’ve created a new alert. It will run realtime (every minute) and look for [vim.event.VmRemovedEvent]
. When the count is greater than zero, it will notify my custom webhook. After some testing, I also added an additional filter where text does not contain an error message I was seeing or my VDI cluster to exclude those VMs from being in the scope of this cleanup. I’ve included a screenshot of this alert definition below:
Jenkins – New Freestyle project
In Jenkins I created a new ‘freestyle project’ to receive this webhook. I enabled the ‘Generic Webhook Trigger’ in the job properties. I defined a post content parameter to create a variable named eventmsg
that had the JSONPath expression of $[0].text
which parses out just the text of the first item in the event to a variable. Since we defined our webhook to send individual logs, we wouldn’t expect more items than the first item “[0]
” to be included in our payload.
In the webhook trigger ‘token’ property, I used the string vmDeprovisionRequest
. We included this value in our Aria Operations for Logs webhook payload above. I also checked the boxes for ‘print post content’ and ‘print contributed variables’ so that the console logs would have the details of what triggered my request. This is optional, but very helpful for troubleshooting in the future.
The next part of this is where we have a small amount of custom code. Similar to VEBA, what we do when we see this event is really only limited by our imagination. Here I’m extracting some variables, doing a bit of error checking/validation, writing some text and doing the two tasks I want to execute on VM deletion — cleaning up active directory and Aria Automation Config. This code could be better, but I’ll share it for anyone interested.
$vmname = [regex]::Match( $env:eventmsg.Split('[')[9] , "Removed (.*?) on " ).Groups[1].Value
$username = $env:eventmsg.Split('[')[6] -Replace (']','')
if ($env:eventmsg -notmatch 'vim.event.VmRemovedEvent') { "This event message does not match expected value. Ending."; exit 1 }
"Received request to deprovision VM $vmname after it was deleted by user: $($username)."
if ($username -match 'svc-horizon') { "This request came from user $username which has been excluded from process. Ending."; exit 1 }
# AD Cleanup
$thisComputer = Get-ADComputer -Filter "Name -eq '$vmname'"
if (($thisComputer | Measure-Object).Count -eq 1) {
"Found in AD: $($thisComputer.DistinguishedName), will proceed to move/disable."
$thisDate = (Get-Date).ToString('yyyy-MM-dd HH:mm')
Set-ADComputer -Identity $thisComputer -Description "Disabled by Jenkins on $thisDate" -Enabled $false
Move-ADObject -Identity $thisComputer -TargetPath "OU=_ToDelete,OU=LAB Servers,DC=lab,DC=enterpriseadmins,DC=org"
}
# Remove SSC minion
$cs = Connect-SscServer 'cm-config-01.lab.enterpriseadmins.org' -user 'root' -password 'VMware1!'
$sscMinionKey = Get-SscMinionKey |?{$_.minion -match "$vmname\."}
if (($sscMinionKey | Measure-Object).Count -eq 1) {
"Found Salt Minion $($sscMinionKey.minion) in state $($sscMinionKey.key_state), will proceed to move/disable."
$sscMinionKey | Remove-SscMinionKey -Confirm:$false
}
Testing
I created a handful of VMs and deleted them… some deletes one at a time, others in batches, just to see what would happen. Aria Operations for Logs would call Jenkins, Jenkins would run the job, and I could check the build history / console output tab to see what happened.
I’ll include the console output of one specific delete attempt below as reference. We can see the event log text received by the webhook, the variables Jenkins extracted, and the text output of the script execution that moved/disabled AD computer objects & the Job ID of an Aria Automation Config minion key being deleted.
GenericWebhookEnvironmentContributor
Received:
[{"text":"2024-07-08T01:02:33.381560+00:00 core-vcenter01 vpxd 7289 - - Event [19854143] [1-1] [2024-07-08T01:02:33.380981Z] [vim.event.VmRemovedEvent] [info] [LAB\\svc-vra] [US-East-IN] [19854141] [Removed h199-linux-11 on core-esxi-33.lab.enterpriseadmins.org from US-East-IN]","timestamp":1720400553384,"fields":[{"name":"hostname","content":"core-vcenter01"},{"name":"appname","content":"vpxd"},{"name":"procid","content":"7289"},{"name":"msgid","content":"-"},{"name":"client_syslog_version","content":"1"},{"name":"__li_source_path","content":"CORE-VCENTER01.lab.enterpriseadmins.org"},{"name":"priority","content":"info"},{"name":"facility","content":"user"},{"name":"client_syslog_priority","content":"14"}]}]
Contributing variables:
eventmsg = 2024-07-08T01:02:33.381560+00:00 core-vcenter01 vpxd 7289 - - Event [19854143] [1-1] [2024-07-08T01:02:33.380981Z] [vim.event.VmRemovedEvent] [info] [LAB\svc-vra] [US-East-IN] [19854141] [Removed h199-linux-11 on core-esxi-33.lab.enterpriseadmins.org from US-East-IN]
[vmDeprovisionRequest] $ powershell.exe -NonInteractive -ExecutionPolicy Bypass -File C:\Users\SVC-JE~1\AppData\Local\Temp\jenkins14142109662625721398.ps1
Received request to deprovision VM h199-linux-11 after it was deleted by user: LAB\svc-vra .
Found in AD: CN=H199-LINUX-11,OU=Services,OU=LAB Servers,DC=lab,DC=enterpriseadmins,DC=org, will proceed to move/disable.
Found Salt Minion h199-linux-11.lab.enterpriseadmins.org in state pending, will proceed to move/disable.
jid:20240708010334940144
Finished: SUCCESS
Conclusion
Event driven automation can be very powerful. The above example could have been tackled a variety of ways… as you can see in the example above the VM delete was actually triggered by my Aria Automation service account as I had deleted the deployment from that tool. I could have completed this same task using ABX when the VM was deleted. However, I like the idea of doing the cleanup tasks outside of Aria Automation to catch any one-off lab systems that get created manually or using other automation tools. Again, this is just one example of an event driven “if this” (VM deleted), “then that” (cleanup task) approach to automation. There are likely many better examples, this was just some cleanup I had been wanting to tackle anyway.
Pingback: Event Driven Automation: Domain Controller FSMO role has moved | Enterprise Admins.org