Improving SQL Update/Insert performance

I was recently reviewing a Powershell script that had slow performance. The script used Get-MailboxFolderStatistics to pull information from an Exchange Server environment and insert (or update) a record in a SQL database based off the email address. The code was pretty simple…it ran a loop like the following for each user:

$count = "SELECT count(*) from MailboxFolderStats where email=$_"

IF ($count -eq 0) { 
#SQL INSERT CODE
} else { 
#SQL UPDATE CODE
}

Note: This isn’t actual powershell, it is just an outline of the logic used by the script.

Each iteration of the above block would not take long to run, but the loop was executing that block about 40,000 times. After a bit of searching, I came across this Stackoverflow.com article on using a insert update stored procedure. The article describes this technique as an optimistic insert or upsert/merge. Here is how it works…starting with the simplified powershell code:

#SQL EXEC CODE

Note: This isn’t actual powershell, it is just an outline of the logic used by the script.

In SQL, you create a stored procedure that handles the logic (which was previously handled by the client powershell script)

update MailboxFolderStats set Col1=@col1, Col2=@col2 where ID=@ID
if @@rowcount = 0
insert into MailboxFolderStats (Col1, Col2) values (@col1, @col2)

The interesting part of this change is that we no longer select the count of rows from SQL. Instead, SQL attempts to update a record without checking to see if it exists first. Silly, no? Actually, this is what helps with performance most. The majority of the time when this stored procedure executes, a record will exist that needs updated. After the update, SQL checks to make sure something updated — and if not will finish up by adding the entry.

The previous code required (approximately)
– 40,000 select statements
– 39,500 update statements
– 500 insert statements

The modified code will perform (approximately)
– 40,000 update statements
– 500 insert statements

That’s 39,500 fewer connections to the remote SQL server! This reduces network I/O from the client and load on the SQL server at the same time. This single change saved about 8 minutes on the execution time of the script.

My favorite part of this updated code? It shows that performance can be gained in a variety of ways. My script was written in powershell, but the performance gain was provided by more efficient use of SQL server. When troubleshooting performance problems in code there are usually a variety of solutions — some of them require more code and some of them require less. In this example, I have less code (even when you combine the powershell code with the stored procedure code) and better performance. I have another example from this same code review (it is the exact lines of code, just in a different order) but I’ll save that for a future post.

This entry was posted in Scripting. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Notify me of followup comments via e-mail. You can also subscribe without commenting.