Linked mode, server replacement and VMwareVCMSDS errors

This is an issue I experienced a couple of months back, but thought I’d take some time and document it here as I couldn’t find any reference to other people experiencing this issue. For some background for this environment there are two vCenters joined with Linked Mode. One of the vCenters was for a disaster recovery site, which started rather small and was using the default local SQLEXPRESS installation. As the site grew, we decided to move the database to a remote SQL server. For a migration plan we decided to stop the services on the old vCenter server, move the database and then run the vCenter installer on a new VM without SQLEXPRESS installed. Shortly after this migration we noticed an error and two warning events being logged each day. I’ve included the full text of each of these three events below:

Event Log Error:

This is the replication status for the following directory partition on this directory server. Directory partition: DC=virtualcenter,DC=vmware,DC=int This directory server has not recently received replication information from a number of directory servers. The count of directory servers is shown, divided into the following intervals. More than 24 hours: 1 More than a week: 1 More than one month: 0 More than two months: 0 More than a tombstone lifetime: 0 Tombstone lifetime (days): 180 Directory servers that do not replicate in a timely manner may encounter errors. They may miss password changes and be unable to authenticate. A DC that has not replicated in a tombstone lifetime may have missed the deletion of some objects, and may be automatically blocked from future replication until it is reconciled. To identify the directory servers by name, use the dcdiag.exe tool. You can also use the support tool repadmin.exe to display the replication latencies of the directory servers. The command is "repadmin /showvector /latency ".

Event Log Warnings:

The remote server which is the owner of a FSMO role is not responding. This server has not replicated with the FSMO role owner recently. Operations which require contacting a FSMO operation master will fail until this condition is corrected. FSMO Role: CN=Schema,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A} FSMO Server DN: CN=NTDS Settings,CN=OLDVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A} Latency threshold (hours): 24 Elapsed time since last successful replication (hours): 859 User Action: This server has not replicated successfully with the FSMO role holder server. 1. The FSMO role holder server may be down or not responding. Please address the problem with this server. 2. Determine whether the role is set properly on the FSMO role holder server. If the role needs to be adjusted, utilize NTDSUTIL.EXE to transfer or seize the role. This may be done using the steps provided in KB articles 255504 and 324801 on http://support.microsoft.com. 3. If the FSMO role holder server used to be a domain controller, but was not demoted successfully, then the objects representing that server are still in the forest. This can occur if a domain controller has its operating system reinstalled or if a forced removal is performed. These lingering state objects should be removed using the NTDSUTIL.EXE metadata cleanup function. 4. The FSMO role holder may not be a direct replication partner. If it is an indirect or transitive partner, then there are one or more intermediate replication partners through which replication data must flow. The total end to end replication latency should be smaller than the replication latency threshold, or else this warning may be reported prematurely. 5. Replication is blocked somewhere along the path of servers between the FSMO role holder server and this server. Consult your forest topology plan to determine the likely route for replication between these servers. Check the status of replication using repadmin /showrepl at each of these servers. The following operations may be impacted: Schema: You will no longer be able to modify the schema for this forest. Domain Naming: You will no longer be able to add or remove domains from this forest. PDC: You will no longer be able to perform primary domain controller operations, such as Group Policy updates and password resets for non-Active Directory Lightweight Directory Services accounts. RID: You will not be able to allocation new security identifiers for new user accounts, computer accounts or security groups. Infrastructure: Cross-domain name references, such as universal group memberships, will not be updated properly if their target object is moved or renamed.

and:

The remote server which is the owner of a FSMO role is not responding. This server has not replicated with the FSMO role owner recently. Operations which require contacting a FSMO operation master will fail until this condition is corrected. FSMO Role: CN=Partitions,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A} FSMO Server DN: CN=NTDS Settings,CN=OLDVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A} Latency threshold (hours): 24 Elapsed time since last successful replication (hours): 859 User Action: This server has not replicated successfully with the FSMO role holder server. 1. The FSMO role holder server may be down or not responding. Please address the problem with this server. 2. Determine whether the role is set properly on the FSMO role holder server. If the role needs to be adjusted, utilize NTDSUTIL.EXE to transfer or seize the role. This may be done using the steps provided in KB articles 255504 and 324801 on http://support.microsoft.com. 3. If the FSMO role holder server used to be a domain controller, but was not demoted successfully, then the objects representing that server are still in the forest. This can occur if a domain controller has its operating system reinstalled or if a forced removal is performed. These lingering state objects should be removed using the NTDSUTIL.EXE metadata cleanup function. 4. The FSMO role holder may not be a direct replication partner. If it is an indirect or transitive partner, then there are one or more intermediate replication partners through which replication data must flow. The total end to end replication latency should be smaller than the replication latency threshold, or else this warning may be reported prematurely. 5. Replication is blocked somewhere along the path of servers between the FSMO role holder server and this server. Consult your forest topology plan to determine the likely route for replication between these servers. Check the status of replication using repadmin /showrepl at each of these servers. The following operations may be impacted: Schema: You will no longer be able to modify the schema for this forest. Domain Naming: You will no longer be able to add or remove domains from this forest. PDC: You will no longer be able to perform primary domain controller operations, such as Group Policy updates and password resets for non-Active Directory Lightweight Directory Services accounts. RID: You will not be able to allocation new security identifiers for new user accounts, computer accounts or security groups. Infrastructure: Cross-domain name references, such as universal group memberships, will not be updated properly if their target object is moved or renamed.

These events relate to VMwareVCMSDS, the Active Directory Application Mode (also know as Active Directory Light Weight Directory Services or LDS) installation that replicates role definitions and license information in linked mode environments. This component is also used for VMware View Connection Servers, so this issue could potentially exist in View environments. As with domain controllers there are several steps typically required to remove a directory server. If those steps aren’t completed correctly, you could see error messages similar to the ones documented above. In the event of a server failure event (like the one I created by replacing the vCenter server) these steps can be completed using directory services management utilities. I’ve included instructions below that worked in my environment. If my understanding of the response text from dsmgmt is correct, you can probably skip directly to part 3 and it should take care of everything. However, I did all three separate tasks and documented them as such:

Part 1: Sieze Schema Master
Open a command prompt and type the underlined text from the box below:

dsmgmt
dsmgmt: roles
fsmo maintenance: connections
server connections: connect to server newvcenter:389
server connections: quit
fsmo maintenance: seize schema master
Attempting safe transfer of schema FSMO before seizure.
ldap_modify_sW error 0x34(52 (Unavailable).
Ldap extended error message is 000020AF: SvcErr: DSID-03210397, problem 5002 (UNAVAILABLE), data 1772 Win32 error returned is 0x20af(The requested FSMO operation failed. The current FSMO holder could not be contacted.))
Depending on the error code this may indicate a connection, ldap, or role transfer error.
Transfer of schema FSMO failed, proceeding with seizure ...
Server "newvcenter:389" knows about 2 roles
Schema - CN=NTDS Settings,CN=NEWVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}
Naming Master - CN=NTDS Settings,CN=OLDVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}

Part 2: Sieze naming master
The following steps assume you have just completed part 1 and are still in the dsmgmt utility. Again, you’ll want to type only the underlined text from the box below:

fsmo maintenance: seize naming master
Attempting safe transfer of domain naming FSMO before seizure.
ldap_modify_sW error 0x34(52 (Unavailable).
Ldap extended error message is 000020AF: SvcErr: DSID-03210397, problem 5002 (UNAVAILABLE), data 1772 Win32 error returned is 0x20af(The requested FSMO operation failed. The current FSMO holder could not be contacted.))
Depending on the error code this may indicate a connection,
ldap, or role transfer error.
Transfer of domain naming FSMO failed, proceeding with seizure ...
Server "newvcenter:389" knows about 2 roles
Schema - CN=NTDS Settings,CN=NEWVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}
Naming Master - CN=NTDS Settings,CN=NEWVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}

As you can see, part 1 and part 2 will make ‘NEWVCENTER’ responsible for the FSMO roles that were causing the two warnings in the event log.

Part 3: Metadata cleanup
The following steps will force a removal of residual references to the permanently removed server. After completing the steps, I believe that this step alone would have resolved all my issues. However, for completeness in this post I’m including all steps completed as together I’m confident the issue was resolved. Again, you’ll want to open a command window and type only the underlined text from the box below:

Dsmgmt
Dsmgmt: metadata cleanup
Metadata cleanup: connections
Server connections: connect to server newvcenter:389
Server connections: quit
Metadata cleanup: select operation target
Select operation target: list sites
0 - CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}
Select operation target: select site 0
Select operation target: list servers in site
Found 3 server(s)
0 - CN=OLDVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}
1 - CN=NEWVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}
2 - CN=NEWVCENTER1$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}
Select operation target: select server 0
Select operation target: quit
Metadata cleanup: remove selected server

Transferring / Seizing FSMO roles off the selected server.
"CN=OLDVCENTER$VMwareVCMSDS,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,CN={A6FD2645-1111-2222-3333-EE3305E5875A}" removed from server "NEWVCENTER:389"
Metadata cleanup: quit
Dsmgmt: quit

This concludes all of the required steps. The events that were being logged once per day were no longer being logged after completing these steps. I hope documenting them here will help someone else. If so, please leave a comment and let us know. Thanks!

This entry was posted in Virtualization. Bookmark the permalink.