Improving performance with Get-MailboxFolderStatistics

In a previous post I mentioned some performance improvements I made in an Exchange script that used Get-MailboxFolderStatistics. A portion of this script selected the size (in megabytes) of the top 5 folders in each users mailbox. I made a subtle change to the code that saved about 35 minutes on the execution time of the script.

For purposes of the following examples, I measured the execution time of this block of code using my Exchange mailbox. My mailbox has about 250 folders and is about 1.4GB in size.

Previous version of code takes 0.2329 seconds to complete.
[cc lang=”powershell”]
measure-command {
# Previous
$stats = Get-MailboxFolderStatistics $mbx.identity
$folders = $stats | Select-Object folderpath,@{name=”Size(MB)”;Expression={$_.FolderAndSubfolderSize.toMB()}}
$folders = $folders | Where-Object {$_.”Size(MB)” -ne 0} | Sort-Object -Property “Size(MB)” -Descending | Select-Object -First 5
$folders
}
[/cc]

Updated version of code takes 0.1418 seconds to complete.
[cc lang=”powershell”]
measure-command {
# Updated
$stats = Get-MailboxFolderStatistics $mbx.identity
$folders = $stats | Where-Object {$_.FolderAndSubfolderSize -gt 1048576} | Sort-Object -Property “FolderAndSubfolderSize” -Descending | Select-Object -First 5
$folders = $folders | Select-Object folderpath,@{name=”Size(MB)”;Expression={$_.FolderAndSubfolderSize.toMB()}}
$folders
}
[/cc]

On my account, this is a savings of only 0.0911 seconds. That doesn’t sound like enough savings to implement, does it? Actually, if you take my mailbox example and expand it by 40,000 users it would be a savings of about 1 hour. My mailbox is likely larger than the average user, which explains why the savings is closer to 35 minutes instead of 60.

Can you notice the change in the code? It is subtle…nearly the same code/commands are being used — only the order was changed slightly.

In the previous version of the script, all of the folders were selected into an object and the ToMB method was called on the FolderAndSubfolderSize property. The next line of code selects only the first five non-zero entries.

In the updated version of the code, only the first five non-zero entries were added to the object. The next line of code then called the ToMB method on only the first five folders. In my example, this is a savings of about 245 calls on the ToMB method.

Posted in Messaging, Scripting | Leave a comment

Improving SQL Update/Insert performance

I was recently reviewing a Powershell script that had slow performance. The script used Get-MailboxFolderStatistics to pull information from an Exchange Server environment and insert (or update) a record in a SQL database based off the email address. The code was pretty simple…it ran a loop like the following for each user:

$count = "SELECT count(*) from MailboxFolderStats where email=$_"

IF ($count -eq 0) { 
#SQL INSERT CODE
} else { 
#SQL UPDATE CODE
}

Note: This isn’t actual powershell, it is just an outline of the logic used by the script.

Each iteration of the above block would not take long to run, but the loop was executing that block about 40,000 times. After a bit of searching, I came across this Stackoverflow.com article on using a insert update stored procedure. The article describes this technique as an optimistic insert or upsert/merge. Here is how it works…starting with the simplified powershell code:

#SQL EXEC CODE

Note: This isn’t actual powershell, it is just an outline of the logic used by the script.

In SQL, you create a stored procedure that handles the logic (which was previously handled by the client powershell script)

update MailboxFolderStats set Col1=@col1, Col2=@col2 where ID=@ID
if @@rowcount = 0
insert into MailboxFolderStats (Col1, Col2) values (@col1, @col2)

The interesting part of this change is that we no longer select the count of rows from SQL. Instead, SQL attempts to update a record without checking to see if it exists first. Silly, no? Actually, this is what helps with performance most. The majority of the time when this stored procedure executes, a record will exist that needs updated. After the update, SQL checks to make sure something updated — and if not will finish up by adding the entry.

The previous code required (approximately)
– 40,000 select statements
– 39,500 update statements
– 500 insert statements

The modified code will perform (approximately)
– 40,000 update statements
– 500 insert statements

That’s 39,500 fewer connections to the remote SQL server! This reduces network I/O from the client and load on the SQL server at the same time. This single change saved about 8 minutes on the execution time of the script.

My favorite part of this updated code? It shows that performance can be gained in a variety of ways. My script was written in powershell, but the performance gain was provided by more efficient use of SQL server. When troubleshooting performance problems in code there are usually a variety of solutions — some of them require more code and some of them require less. In this example, I have less code (even when you combine the powershell code with the stored procedure code) and better performance. I have another example from this same code review (it is the exact lines of code, just in a different order) but I’ll save that for a future post.

Posted in Scripting | Leave a comment

Get Regular Expression matches

Recently I was given a task at work…someone pointed at a haystack and asked me to find a needle. Actually, the task involved reading the body of hundreds of email messages looking for telephone numbers. Regular expressions and powershell make this pretty easy:

([regex]"[2-9]\d{2}-\d{3}-\d{4}").Match("Body with phone number like 345-555-6789 in it.") | %{ $_.Value }

After completing this task, I figured it would be fairly easy to turn this into a function with predefined regular expressions that might be common. Here is that function.

[cc lang=”powershell” width=”550″]
Function Get-RegexMatches{
param(
[parameter(Mandatory=$true, ValueFromPipeline=$true)][alias(“input”)][String]$inputText,
[parameter(Mandatory=$false)][alias(“email”)][switch]$mail,
[parameter(Mandatory=$false)][alias(“telephoneNumber”)][switch]$phone,
[parameter(Mandatory=$false)][alias(“zipCode”)][switch]$zip,
[parameter(Mandatory=$false)][alias(“SocialSecurityNumber”)][switch]$ssn,
[parameter(Mandatory=$false)][alias(“ip”)][switch]$ipv4,
[parameter(Mandatory=$false)][switch]$ipv6,
[parameter(Mandatory=$false)][alias(“HostName”)][alias(“dnsHostName”)][switch]$dns,
[parameter(Mandatory=$false)][alias(“regex”)][string]$pattern
)

process {
if ($mail) { $regexTag = [regex]”[a-z0-9!#$%&’*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&’*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?” }
if ($phone) { $regexTag = [regex]”[2-9]\d{2}-\d{3}-\d{4}([\ ][x]\d{1,5})?” }
if ($zip) { $regexTag = [regex]”\d{5}([\-]\d{4})?”}
if ($ssn) { $regexTag = [regex]”((?!000)(?!666)(?:[0-6]\d{2}|7[0-2][0-9]|73[0-3]|7[5-6][0-9]|77[0-2]))-((?!00)\d{2})-((?!0000)\d{4})” }
if ($ipv4) { $regexTag = [regex]”0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])” }
if ($ipv6) { $regexTag = [regex]”\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*” }
if ($dns) { $regexTag = [regex]”([\d\w-.]+?\.(a[cdefgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmnoz]|e[ceghrst]|f[ijkmnor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eouw]|s[abcdeghijklmnortuvyz]|t[cdfghjkmnoprtvwz]|u[augkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|aero|arpa|biz|com|coop|edu|info|int|gov|mil|museum|name|net|org|pro)(\b|\W(?“my text with emailuser@somedomain.com” | Get-RegexMatches -mail “my text with emailuser@somedomain.com” | Get-RegexMatches -host

Please let me know if you have any suggestions or other use cases for such a function!

Posted in Scripting | 1 Comment

Find upper range of AD attributes

The other day I was working on a powershell script to copy specific Active Directory attributes to a SQL database. I wanted to see if the destination SQL fields (which were varchar 100) would be able to contain the attributes, but I didn’t know the maximum lengths of the text in Active Directory. I hacked together a quick LDAP query to get all of the ‘rangeUpper’ values for each attribute. Here is that code:

[cc lang=”powershell”]
$myReport=@()

$Base = “
$Filter = “(&(objectCategory=attributeSchema)(rangeUpper=*))”
$Attributes = “distinguishedName”
$Scope = “subtree”
$Query = “$Base;$Filter;$Attributes;$Scope”
$Connection = New-Object -ComObject “ADODB.Connection”
$Command = New-Object -ComObject “ADODB.Command”
$Connection.Open(“Provider=ADsDSOObject;”)
$Command.ActiveConnection = $Connection
$Command.Properties.item(“Size Limit”).value = 90000
$Command.Properties.item(“Page Size”).value = 90000
$Command.CommandText = $Query
$Recordset = $Command.Execute()
Do {
$dn = $Recordset.Fields.item(“distinguishedName”)
$dn = $dn.value

$objSchema = Get-ADObject $dn

$myReport+= New-Object psobject -Property @{
LDAPAttribute = [string]$objSchema.lDAPDisplayName
MaxValue = [string]$objSchema.rangeUpper
}

if ($Recordset.eof -ne $true) {$Recordset.MoveNext()}
}
Until ($Recordset.eof)
$Connection.Close > $null
$myReport | Sort LDAPAttribute
[/cc]

Hope this can help you too!

Posted in Scripting | Leave a comment

Cisco UCS and Default Authentication Domains

I’ve recently been working with Cisco UCS — and more specifically the UCS Powertool (the powershell management component). Initially, my scripts were using a local account in UCSM for authentication. However, after changing the “Default Authentication Realm” from Local to LDAP (see screenshot below), I could no longer authenticate using a local account in my script

Using an account from the configured default realm worked — which makes plenty of sense as this is the provider UCS uses for authentication by default.  However, I wanted to be able to specify the authentication realm in my scripts and override the default if necessary.

I had an idea on how to attempt this configuration while looking at the remotely authenticated user section.  You’ll notice in the screenshot below that I have a remotely connected LDAP user with a prefix of ucs-ldap22\ (its at the bottom of the image).

Yes, that’s right.  UCS hyphen NameOfAuthenticationDomain backslash UserName.  I created an authentication domain named local, which maps to the local authentication provider, and then attempted to Connect-UCS specifying the username in the format ucs-local\myUserID — SUCCESS!    Just thought I’d share this information in case anyone else has a similar need.

Posted in Virtualization | 5 Comments