Moving to XtremIO (or any other array)

I was recently asked to help move VMs from one datastore backed by traditional spinning disk to an EMC XtremIO datastore.  I was really excited as I wanted to see the power of an all flash array.

As a best practice, most all flash vendors recommend using their own multipathing software (PowerPath/VE) or round robin. In this environment, the customer choose to go with RoundRobin. This is pretty easy to do in the GUI, but you need to do it for each device and on each host in a cluster.  Not to worry, PowerCLI is willing to help.  The following code will find all XtremIO LUNs not currently set to RoundRobin and change them.

1
2
3
4
Get-VMHost "esx-cl3-*" | Get-VMHostHba -Type "FibreChannel" |
Get-ScsiLun -LunType "disk" |
where {$_.MultipathPolicy -ne "RoundRobin" -and $_.Vendor -eq 'XtremIO'} |
Set-ScsiLun -MultipathPolicy RoundRobin

Once we have the new datastores all prepared and ready to go, we can start moving virtual machines. In the customers environment, two different clusters of ESXi hosts shared a common set of datastores. As part of this migration, the customer wanted to dedicated datastores to each cluster (removing their cross cluster datastores). Again, PowerCLI was willing to help. In the following code, we will move all of the virtual machines from MyOldDataStore that are on a host whose name matches esx-cl3. We will sort the virtual machines by name, just for tracking purposes and kick out an email when the process is complete (just so we don’t have to babysit the process):

1
2
3
4
5
6
Get-Datastore MyOldDataStore | Get-VM | ?{$_.VMHost.Name -match 'esx-cl3'} | Sort Name | %{
   write-host "$($_.Name) will be moved to XtremIO..." -NoNewline
   [void]( $_ | Move-VM -Datastore xtrem-cl3-vol1 -DiskStorageFormat:EagerZeroedThick )
   Write-Host " done!"
}
Send-MailMessage -To 'notify@mydomainname.com' -Subject "Done with CL3 group" -From 'notify@mydomainname.com' -SmtpServer 'smtp.mydomainname.com'

That was pretty easy, and now the VMs will benefit from the consistent low-latency of an all flash array. Now you probably need some help managing and monitoring your array. Fortunately the team over at vNugglets have you covered. Check out their latest post at http://www.vnugglets.com/2014/04/xtremio-powershell-module-report-on.html. It is a fantastic module and really worth checking out.

Lack of posts

I realized today that I hadn’t posted anything new since December. That’s four full months without a post… normally I would have averaged a dozen posts in this period of time. A few things have contributed to to this lack of posting.

  • Working on projects that really lack code worth sharing.  There were a few tips/tricks I picked up specifically around making graphical user interfaces with SAPIEN PowerShell Studio (that I plan to share soon) but overall the projects focused on solving very specific problems that don’t share well.
  • Book reviews.  I had the opportunity to review several vSphere related books for Packt Publishing.  This was an interesting project, but consumed some of the free time I used for blogging. I plan on posting reviews of these books soon.
  • A career change.  I accepted a promotion from a technical to managerial role.  This has been more of a change that I had anticipated and adjusting to the new demands has consumed some time.

Anyway, enough with excuses.  I have a few post ideas lined up and hope to have them out the door in the next few weeks.  Check back soon!

Migration to Office 365/Exchange Online

I recently had a chance to help a small business move their 70 mailboxes to the Exchange Online service. This company was running Windows 2008 with Exchange 2007 on premises installation and we helped them migrate to Office 365/Exchange Online over a couple weekends. During the migration we ran into several issues. I couldn’t find many online resources where people documented these errors, so I wanted to write them down on this site.

Issue #1
Symptoms:
Outlook Anywhere (RPC over HTTPS) service was configured, running and available to end users. However we received various errors when trying to establish a migration endpoint. Even though the service was working, the Remote Connectivity Analyzer (https://testconnectivity.microsoft.com/) was failing on Outlook Anywhere tests.
Resolution:
Looking around online, we thought the issue was related to the Exchange 2007 service pack level, so we upgraded to SP3. I’m not sure if this actually helped, but I figured it was worth mentioning as it may have fixed other issues we could have seen during the migration. The actual fix for this issue (and unfortunately I can’t find the Office 365 communities post where we found the suggestion) was to create a hosts file entry on the exchange server containing the IP, server name and FQDN of the internal Exchange server name. DNS was working perfectly, so I’m not sure why this was needed. However, after the entry was added the Remote Connectivity Analyzer tests started working and we were able to move forward with the migration.

Issue #2
Symptoms:
Batch loaded into Office 365 environment to begin migration, but after several hours the task fails
Error log may mention “MigrationPermanentException: Error: MapiExceptionLogonFailed: Unable to make connection to the server”
Resolution:
Verify permissions, specifically Receive-As rights to the database (http://community.office365.com/en-us/forums/158/t/18911.aspx).

Add-ADPermission -Identity "Mailbox Store" -User "Trusted User" -ExtendedRights Receive-As

Issue #3
Symptoms:
When opening Outlook as a user when the PC is joined to the domain, the on premises mailbox is opened instead of the Office 365 mailbox. When using a non-domain joined test PC, the Office 365 mailbox was opened.
Resolution:
This was caused by the way Exchange handles autodiscover. You can read more about the process here: http://msdn.microsoft.com/en-us/library/office/jj900169(v=exchg.150).aspx. We found a pair of scripts (ExportO365UserInfo.ps1 and Exchange2007MBtoMEU.ps1) available here: http://community.office365.com/en-us/wikis/exchange/845.aspx which allow you to convert the users active directory account into a mail enabled user that references Office 365 instead of a mailbox user in the on premises install.

This was my first experience with Office 365/Exchange Online. I was surprised at how complicated some of the migration steps were. With the whole ‘cloud-based’ self service model, I assumed that the migration path would be just a few clicks. With the handful of lessons learned from going through this process once, I would feel more comfortable doing another migration (but its not something I would volunteer for). I hope someone finds this post helpful.

PowerCLI: Getting LUN paths when using EMC PowerPath/VE

A few weeks back I wanted to verify some path counts per LUN. This is typically pretty easy and something that can be written as a one liner using standard PowerCLI cmdlets as such:

Get-VMHost | Get-ScsiLun | Get-ScsiLunPath

However, the above command wouldn’t return results in the customer environment. After doing some testing, I realized that the issue was likely related to the presence of EMC PowerPath/VE for multipathing on the hosts. When using the GUI to view storage/LUN properties other details like Path Selection Plugin (PSP) is also missing… but the path information I wanted was still available. It took a little bit of poking around in the Get-View output, but I was able to come up with something to get me the data I was looking for. Its not real pretty, but it is fast and helped me answer a couple questions. I figured I would share the code here here in case anyone else runs into this issue. If you have any comments/suggestions on how to make this code better/more complete please post them in the comments section.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$dsHt = Get-View -ViewType Datastore -Property "Info","Summary.Type" -Filter @{"Summary.Type"="VMFS"} |
Select @{N="DSName";E={$_.Info.Vmfs.Name}}, @{N="Capacity";E={[math]::round( ($_.Info.Vmfs.Capacity / 1024 / 1024), 0)}},
@{N="Extent";E={$_.Info.Vmfs.Extent[0].DiskName}}, @{N="VMFS Version";E={$_.Info.Vmfs.Version}} |
Group-Object "Extent" -AsHashTable -AsString

$results = @()
Get-View -ViewType HostSystem -SearchRoot (Get-Cluster ClusterPod8).id -Property Name, Config.StorageDevice | %{
  $thisHostName = $_.Name
  $_.Config.StorageDevice.PlugStoreTopology.path | ?{$_.Name -match 'naa'} | Group-Object LunNumber | Sort-object Name | %{
    try {
      $thisNaaId = ($_.Group[0].Name -split "naa.")[1]
      $results += New-Object psobject -Property @{
        HostName = $thisHostName
        LunNumber = $_.Name
        PathCount = $_.Count
        DSName = $dsHt["naa.$thisNaaId"][0].DSName
        Capacity = $dsHt["naa.$thisNaaId"][0].Capacity
        "VMFS Version" = $dsHt["naa.$thisNaaId"][0]."VMFS Version"
      }
    } catch {
      Write-Warning "Found something with $thisNaaId"
    }
  } # end this Lun
} # end this host

$results |Group-Object pathcount

Getting data out of vCOps

I’ve been troubleshooting a specific problem where storage latency jumps very high during very short periods of time, usually in the late evening/very early morning hours. The latency is very bad, sometimes in the 2,000ms+ neighborhood. My storage guys see an extreme increase in IOPS coming from my ESXi hosts just before the latency comes into play. The working thought was several VMs were kicking off some type of disk intensive batch job around the same time. This would be a perfect use of the vCOps troubleshooting Top N charts, but the issue doesn’t appear every day and is typically resolved before anyone noticed. Since the Top N charts are realtime they are not super useful in this situation.

What I needed was a way to export which VMs were contributing high IO around the time of the poor latency. Clicking around in vCOps I couldn’t find a way to get this data. (Side note: if anyone knows a good way to do this, please leave a comment.) However, a co-worker pointed me at an unofficial vCOps powershell module available here: http://velemental.com/2012/09/04/unofficial-vmware-vcenter-operations-powershell-module/. Using this module, I was able to get all the data points for disk commands by virtual machine during the time period in questions. With a little where-object goodness we can find only those VMs with over 300 IOPS. Looking at the data before applying this filter, I noticed this value would be around 3x the average IO normally seen during this period of time. This isn’t really a good visualization for the amount of data, but it can give me what I need to be able to continue to troubleshoot:

1
2
3
4
5
6
$startDate = Get-Date "9/27/2013 12:01 AM"
$endDate = Get-Date "9/27/2013 5:00 AM"
Get-Datacenter NestedLab | Get-VM |
Get-vCOpsResourceMetric -metricKey "virtualDisk:Aggregate of all instances|commandsAveraged_average" -startDate $startDate -endDate $endDate |
Select-Object Name, @{N="Value";E={[math]::round($_.value,0)}}, Date |
Where-Object {$_.Value -gt 300}

In my case, this method didn’t give me an obvious answer to my problems. However, it did give me a smaller list of virtual machines to focus on.