Recently I was given a task at work…someone pointed at a haystack and asked me to find a needle. Actually, the task involved reading the body of hundreds of email messages looking for telephone numbers. Regular expressions and powershell make this pretty easy:
([regex]"[2-9]\d{2}-\d{3}-\d{4}").Match("Body with phone number like 345-555-6789 in it.") | %{ $_.Value }
After completing this task, I figured it would be fairly easy to turn this into a function with predefined regular expressions that might be common. Here is that function.
Function Get-RegexMatches{
param(
[parameter(Mandatory=$true, ValueFromPipeline=$true)][alias("input")][String]$inputText,
[parameter(Mandatory=$false)][alias("email")][switch]$mail,
[parameter(Mandatory=$false)][alias("telephoneNumber")][switch]$phone,
[parameter(Mandatory=$false)][alias("zipCode")][switch]$zip,
[parameter(Mandatory=$false)][alias("SocialSecurityNumber")][switch]$ssn,
[parameter(Mandatory=$false)][alias("ip")][switch]$ipv4,
[parameter(Mandatory=$false)][switch]$ipv6,
[parameter(Mandatory=$false)][alias("HostName")][alias("dnsHostName")][switch]$dns,
[parameter(Mandatory=$false)][alias("regex")][string]$pattern
)
process {
if ($mail) { $regexTag = [regex]"[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?" }
if ($phone) { $regexTag = [regex]"[2-9]\d{2}-\d{3}-\d{4}([\ ][x]\d{1,5})?" }
if ($zip) { $regexTag = [regex]"\d{5}([\-]\d{4})?"}
if ($ssn) { $regexTag = [regex]"((?!000)(?!666)(?:[0-6]\d{2}|7[0-2][0-9]|73[0-3]|7[5-6][0-9]|77[0-2]))-((?!00)\d{2})-((?!0000)\d{4})" }
if ($ipv4) { $regexTag = [regex]"0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.0*([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])" }
if ($ipv6) { $regexTag = [regex]"\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*" }
if ($dns) { $regexTag = [regex]"([\d\w-.]+?\.(a[cdefgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmnoz]|e[ceghrst]|f[ijkmnor]|g[abdefghilmnpqrstuwy]|h[kmnrtu]|i[delmnoqrst]|j[emop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrstwy]|qa|r[eouw]|s[abcdeghijklmnortuvyz]|t[cdfghjkmnoprtvwz]|u[augkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|aero|arpa|biz|com|coop|edu|info|int|gov|mil|museum|name|net|org|pro)(\b|\W(?
The function currently supports email addresses, phone numbers, zip codes, social security numbers, IPv4 and v6 addresses and DNS Host Names. I dressed up the function using switch parameters that have alias names where appropriate. In addition, we also accept the input string from the pipeline. You can use the function in several ways.
From the pipeline:
"my text with emailuser@somedomain.com" | Get-RegexMatches -mail "my text with emailuser@somedomain.com" | Get-RegexMatches -host
Please let me know if you have any suggestions or other use cases for such a function!
Pingback: Get-Scripting Podcast Episode 32 | CrypticZero