vSphere ESXi Host Certificate Status Alarm bulk resolution

In the vSphere UI, some hosts will occasionally trigger an alarm of “ESXi Host Certificate Status”. VMware Skyline has a finding for this issue as well — vSphere-HostCertStatusAlarm. The resolution is typically straightforward, right click the host > certificates > renew certificate. However, if you have hundreds of hosts where this needs to happen it can be tedious to use the UI. This post will explore why the certificates expire & how to automate their replacement when needed.

By default, these certificates are issued by the VMware Certificate Authority (vmca). A cert is issued to the host when the host is added to vCenter Server. The validity period for the certificate is configured using a vCenter advanced setting. From Inventory > vCenter > Configure > Settings > Advanced Setting, the value for vpxd.certmgmt.certs.daysValid is the length of time that a renewed certificate will be requested.  The default should be 1825, which is 5 years. 

In this view you can also see the vpxd.certmgmt.certs.minutesBefore value.  This is the starting date for the certificate request.  The default 1440 value (24 hours) ensures that anything validating this certificate doesn’t think its not yet valid because it was too recently issued.  I mention this as it’ll be relevant in some of the examples below.

From Administration > Certificates > Certificate Management, the ‘VMware Certificate Authority’ tile shows the validity of the VMCA certificate.  This will be the max age that a new certificate can be valid.  For example, in my lab this value is Nov 9th, 2028.  This is a touch under 5 years, so even though I would request a 5 year validity period (vpxd.certmgmt.certs.daysValid setting above), this VMCA cert will expire prior to that and can only issue certs through this slightly shorter date.  This VMCA certificate is typically valid 10 years from when the VC is first deployed or the cert is replaced.  This cert can be regenerated with certificate-manager (https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.authentication.doc/GUID-E1D35792-ED03-468A-966B-362BED18021A.html), but doing so will restart all vCenter services and is more disruptive than the ESXi host certificate renewal process.

To figure out how to automate this functionality, I enabled code capture (Developer Center > Code Capture) and recorded the action of renewing the certificate from the UI. This showed me what the UI was doing to make the request and gave me a lead on “CertificateManager” being something I should search for.

Armed with that info, I was able put together a short PowerCLI script to query the cert values for all hosts and store the output in a variable so we can use it later.  In the output I included the moref of the host, because that would be needed if we want to call the CertMgrRefreshCertificates_Task that we identified using Code Capture.  CertificateInfo has additional properties such as issuer (the vCenter) and Subject, but those aren’t required for our exercise so I didn’t include them. 

# create a collection to store ESXi Host & certificate details
$myHostCertStatus = @()
foreach ($thisESX in Get-VMHost -State:Connected | Sort-Object Name) {
  $certMgr = Get-View -Id $thisESX.ExtensionData.ConfigManager.CertificateManager
  $myHostCertStatus += $certMgr.CertificateInfo | Select-Object @{N='VMHost';E={$thisESX.name}}, @{N='moref';E={$thisESX.extensiondata.moref}}, NotBefore, NotAfter, Status
}

With our new variable populated, we can look at the specific host from the UI action to cross reference our timestamps. Here is the event from the UI:

Here is the output from our variable, filtered down to only the host we are interested in.

$myHostCertStatus | Where-Object {$_.VMhost -match 'euc-esx-21'}

VMHost    : euc-esx-21.lab.enterpriseadmins.org
moref     : HostSystem-host-25598
NotBefore : 2/12/2024 8:31:07 PM
NotAfter  : 11/9/2028 6:51:22 PM
Status    : good

We can see that the NotBefore value is Monday the 12 at 8:31 PM (EST is -5 hours, so this is 24 hours prior to that 3:31 time from the screenshot above).  The NotAfter is shorter than the 5 years requested, because it follows the VMCA expiry date of Nov 9th.

We now have all the info we need — a way to validate current certificate status, the method we need to call to renew a certificate, and a host we are willing to test with. In the example below I’m filtering down our variable to only one test host and passing that item to the method identified from code capture. A task ID is returned by Power CLI.

$myHostCertStatus | Where-Object {$_.VMhost -match 'euc-esx-21'} | %{
  $thisCertMgr = Get-View -Id 'CertificateManager-certificateManager'
  $thisCertMgr.CertMgrRefreshCertificates_Task($_.moref)
}

Type Value
---- -----
Task task-3269126

If we look in the UI, we can confirm this task was executed and get the timestamp of the request.

Re-running the host query block from above to check this hosts output, we can see that the NotAfter value has not changed (it is still constrained by the VMCA validity) but the NotBefore value has been updated.

VMHost    : euc-esx-21.lab.enterpriseadmins.org
moref     : HostSystem-host-25598
NotBefore : 2/13/2024 1:34:23 PM
NotAfter  : 11/9/2028 6:51:22 PM
Status    : good

Now that we’ve confirmed the certificate has been replaced, and the expiration date aligns with our expectations, we can tweak this to look at the NotAfter or Status properties and run the same code on a larger block of hosts.

This entry was posted in Lab Infrastructure, Scripting, Virtualization. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.