2013/10/19

PowerShell - Get a SubString out of a String using RegEx

Last week one of my colleague asked me if I could help him with some Regular Expression (Regex) to select some text inside a String.

I don't work a lot with RegEx but when I do, I use tools like PowerRegex from Sapien, RegExr, the technet help for about_Regular_Expressions or RegExlib.com. And to be honest, most of the time I'm trying to avoid it...trying to find a solution the "PowerShell Way"  before trying with Regex...


Problem

So here is what he asked me
Out of the following string "OU=MTL1,OU=CORP,DC=FX,DC=LAB" (Which is a Distinguished Name), he wanted to get the name "MTL1", (SiteCode for Montreal).


Solutions

I came up with the two solutions:

The PowerShell way
("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ",")[0].substring(3)

Using RegEx
("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',*..=')[1]

Note: Please leave a comment if you know a better way, I would be curious to learn more.


Solutions proposed by readers


Jay
'OU=MTL1,OU=CORP,DC=FX,DC=LAB' -match '(?<=(^OU=))\w*(?=(,))'
$matches[0]

Robert Westerlund
"OU=MTL1,OU=CORP,DC=FX,DC=LAB" -match "^OU=(?<MTL1>[^,]*)"
$matches["MTL1"]


Steps to solution

First let's check the methods and properties available using Get-Member

PS C:\> "OU=MTL1,OU=CORP,DC=FX,DC=LAB" | get-member
   TypeName: System.String

Name             MemberType            Definition
----             ----------            ----------
Clone            Method                System.Object Clone(), System.Object ICloneable.Clone()
CompareTo        Method                int CompareTo(System.Object value), int CompareTo(string ...
Contains         Method                bool Contains(string value)
CopyTo           Method                void CopyTo(int sourceIndex, char[] destination, int dest...
EndsWith         Method                bool EndsWith(string value), bool EndsWith(string value, ...
Equals           Method                bool Equals(System.Object obj), bool Equals(string value)...
GetEnumerator    Method                System.CharEnumerator GetEnumerator(), System.Collections...
GetHashCode      Method                int GetHashCode()
GetType          Method                type GetType()
GetTypeCode      Method                System.TypeCode GetTypeCode(), System.TypeCode IConvertib...
IndexOf          Method                int IndexOf(char value), int IndexOf(char value, int star...
IndexOfAny       Method                int IndexOfAny(char[] anyOf), int IndexOfAny(char[] anyOf...
Insert           Method                string Insert(int startIndex, string value)
IsNormalized     Method                bool IsNormalized(), bool IsNormalized(System.Text.Normal...
LastIndexOf      Method                int LastIndexOf(char value), int LastIndexOf(char value, ...
LastIndexOfAny   Method                int LastIndexOfAny(char[] anyOf), int LastIndexOfAny(char...
Normalize        Method                string Normalize(), string Normalize(System.Text.Normaliz...
PadLeft          Method                string PadLeft(int totalWidth), string PadLeft(int totalW...
PadRight         Method                string PadRight(int totalWidth), string PadRight(int tota...
Remove           Method                string Remove(int startIndex, int count), string Remove(i...
Replace          Method                string Replace(char oldChar, char newChar), string Replac...
Split            Method                string[] Split(Params char[] separator), string[] Split(c...
StartsWith       Method                bool StartsWith(string value), bool StartsWith(string val...
Substring        Method                string Substring(int startIndex), string Substring(int st...
ToBoolean        Method                bool IConvertible.ToBoolean(System.IFormatProvider provider)
ToByte           Method                byte IConvertible.ToByte(System.IFormatProvider provider)
ToChar           Method                char IConvertible.ToChar(System.IFormatProvider provider)
ToCharArray      Method                char[] ToCharArray(), char[] ToCharArray(int startIndex, ...
ToDateTime       Method                datetime IConvertible.ToDateTime(System.IFormatProvider p...
ToDecimal        Method                decimal IConvertible.ToDecimal(System.IFormatProvider pro...
ToDouble         Method                double IConvertible.ToDouble(System.IFormatProvider provi...
ToInt16          Method                int16 IConvertible.ToInt16(System.IFormatProvider provider)
ToInt32          Method                int IConvertible.ToInt32(System.IFormatProvider provider)
ToInt64          Method                long IConvertible.ToInt64(System.IFormatProvider provider)
ToLower          Method                string ToLower(), string ToLower(cultureinfo culture)
ToLowerInvariant Method                string ToLowerInvariant()
ToSByte          Method                sbyte IConvertible.ToSByte(System.IFormatProvider provider)
ToSingle         Method                float IConvertible.ToSingle(System.IFormatProvider provider)
ToString         Method                string ToString(), string ToString(System.IFormatProvider...
ToType           Method                System.Object IConvertible.ToType(type conversionType, Sy...
ToUInt16         Method                uint16 IConvertible.ToUInt16(System.IFormatProvider provi...
ToUInt32         Method                uint32 IConvertible.ToUInt32(System.IFormatProvider provi...
ToUInt64         Method                uint64 IConvertible.ToUInt64(System.IFormatProvider provi...
ToUpper          Method                string ToUpper(), string ToUpper(cultureinfo culture)
ToUpperInvariant Method                string ToUpperInvariant()
Trim             Method                string Trim(Params char[] trimChars), string Trim()
TrimEnd          Method                string TrimEnd(Params char[] trimChars)
TrimStart        Method                string TrimStart(Params char[] trimChars)
Chars            ParameterizedProperty char Chars(int index) {get;}
Length           Property              int Length {get;}


So how can I get the "MTL1" ? Notice how each elements are separated by a comma ','
Let's try to split them, there is Split() method!

("OU=MTL1,OU=CORP,DC=FX,DC=LAB").split(',')
OU=MTL1
OU=CORP
DC=FX
DC=LAB

Awesome... but instead of the Split() method, let's use the PowerShell -Split Operator.

PS C:\> "OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ','
OU=MTL1
OU=CORP
DC=FX
DC=LAB

Now, Out of the 4 items, we want to select the first one. [0] will do it

PS C:\> ("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',')[0]
OU=MTL1

Finally we can use the method SubString() to select the piece of text we want.
The first letter of the Site code comes after the = sign, so it will be charactere number 3.

PS C:\> ("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',')[0].substring(3)
MTL1

Voila!



Using Regex

RegEx a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching (example: validate an Email format). RegEx allows you to search on Positioning, Characters Matching, Number of Matches, Grouping, Either/Or Matching, Backreferencing. Important to note that you can also use RegEx to replace substring or split your strings.

In my solution I used the following part:
The first part ,* will match zero or more time of the preceding element.
The second part ..= will find any pattern that contains 2 characters followed by '='
A period matches one instance of any character


PS C:\> ("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',*..=')[1]
MTL1



More information

Thanks for Reading! If you have any questions, leave a comment or send me an email at [email protected] I invite you to follow me on Twitter @lazywinadm / Google+ / LinkedIn.

You can also follow the LazyWinAdmin's Blog on Facebook Page and Google+ Page

12 comments:

  1. Good stuff! Here are some functions for processing ldap paths you may like. I hid them in a recent script release (but never actually used them, like a hidden little gift).

    Function Get-TreeFromLDAPPath
    {
    # $Output = [System.Web.HttpUtility]::HtmlDecode(($a | ConvertTo-Html))
    [CmdletBinding()]
    Param
    (
    [Parameter(HelpMessage="LDAP path.")]
    [string]
    $LDAPPath,

    [Parameter(HelpMessage="Determines the depth a tree node is indented")]
    [int]
    $IndentDepth=1,

    [Parameter(HelpMessage="Optional character to use for each newly indented node.")]
    [char]
    $IndentChar = 3,

    [Parameter(HelpMessage="Don't remove the ldap node type (ie. DC=)")]
    [Switch]
    $KeepNodeType
    )
    $regex = [regex]'(?^.+)\=(?.+$)'
    $ldaparr = $LDAPPath -split ','
    $ADPartCount = $ldaparr.count
    $spacer = ''
    $output = ''
    for ($index = ($ADPartCount); $index -gt 0; $index--)
    {
    $node = $ldaparr[($index-1)]
    if (-not $KeepNodeType)
    {
    if ($node -match $regex)
    {
    $node = $matches['LDAPName']
    }
    }
    if ($index -eq ($ADPartCount))
    {
    $line = ''
    }
    else
    {
    $line = $IndentChar
    $spacer = $spacer + (' ' * $IndentDepth)
    # This fixes an offset issue
    if ($index -lt ($ADPartCount - 1))
    {
    $spacer = $spacer + ' '
    }
    }
    $line = $spacer + $line + $node + "`n"
    $output = $Output+$line
    }
    [string]$output
    }

    Function Get-ObjectFromLDAPPath
    {
    [CmdletBinding()]
    Param
    (
    [Parameter(HelpMessage="LDAP path.")]
    [string]
    $LDAPPath,

    [Parameter(HelpMessage="Determines the depth a tree node is indented")]
    [switch]
    $TranslateNamingAttribute
    )
    $output = @()
    $ldaparr = $LDAPPath -split ','
    $regex = [regex]'(?^.+)\=(?.+$)'
    $position = 0
    $ldaparr | %{
    if ($_ -match $regex)
    {
    if ($TranslateNamingAttribute)
    {
    switch ($matches['LDAPType'])
    {
    'CN' {$_ldaptype = "Common Name"}
    'OU' {$_ldaptype = "Organizational Unit"}
    'DC' {$_ldaptype = "Domain Component"}
    default {$_ldaptype = $matches['LDAPType']}
    }
    }
    else
    {
    $_ldaptype = $matches['LDAPType']
    }
    $objprop = @{
    LDAPType = $_ldaptype
    LDAPName = $matches['LDAPName']
    Position = $position
    }
    $output += New-Object psobject -Property $objprop
    $position++
    }
    }
    Write-Output -InputObject $output
    }

    ReplyDelete
  2. Another way of using Regex here is to use these two lines:
    'OU=MTL1,OU=CORP,DC=FX,DC=LAB' -match '(?<=(^OU=))(?=(,))'
    $matches[0]

    This uses the regex concepts of lookahead and lookbehind which are covered fairly well in this article:
    http://blogs.technet.com/b/heyscriptingguy/archive/2011/03/03/use-powershell-regular-expressions-to-format-numbers.aspx

    This matches the string that comes after the pattern '^OU='(the caret '^' is the beginning of line metacharacter) and before the pattern ','. The -match operator returns a boolean and stores the actual matches in the $matches variable.

    Splitting on ',*..=' requires you to know where substring is in the string in order to choose the correct index from the array. If that's the case the substring method is going to be more straightforward than regex. If you're not sure where in the string the pattern is you're better off using regex.

    ReplyDelete
    Replies
    1. Thanks Jay, weird i'm getting an error with
      'OU=MTL1,OU=CORP,DC=FX,DC=LAB' -match '(?<=(^OU=))(?=(,))'
      $matches[0]

      The first line return $false.

      Thanks again for the information, really useful! very appreciated

      Delete
    2. My apologies the first line should be:
      'OU=MTL1,OU=CORP,DC=FX,DC=LAB' -match '(?<=(^OU=))\w*(?=(,))'

      Delete
  3. Another useful link on Regular Expressions in .net:
    Regular Expression Language - Quick Reference
    http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

    ReplyDelete
  4. Here is another way, except that with this method you can select column 1 or column 2 at the same time side by side, take a look:

    $var1 = "OU=MTL1,OU=CORP,DC=FX,DC=LAB"

    This gives column 1
    $var1 | % {"{1}" -f($_ -split ',*..=')}
    MTL1

    Now you want column 1 and 3

    $var1 | % {"{1} {3}" -f($_ -split ',*..=')}
    MTL1 - FX

    if you want a separator of some kind, the dash in this case, "-":

    $var1 | % {"{4} - {1}" -f($_ -split ',*..=')}
    FX - MTL1

    Enjoy!

    ReplyDelete
    Replies
    1. Awesome! Thanks Luis, Great info!
      It's actually faster with your line

      PS C:\Users\Francois-Xavier> Measure-Command { ($var1 -split ',*..=')[1]}


      Days : 0
      Hours : 0
      Minutes : 0
      Seconds : 0
      Milliseconds : 1
      Ticks : 16121
      TotalDays : 1.86585648148148E-08
      TotalHours : 4.47805555555556E-07
      TotalMinutes : 2.68683333333333E-05
      TotalSeconds : 0.0016121
      TotalMilliseconds : 1.6121


      PS C:\Users\Francois-Xavier> Measure-Command { $var1 | % {"{1}" -f($_ -split ',*..=')}}


      Days : 0
      Hours : 0
      Minutes : 0
      Seconds : 0
      Milliseconds : 0
      Ticks : 5494
      TotalDays : 6.3587962962963E-09
      TotalHours : 1.52611111111111E-07
      TotalMinutes : 9.15666666666667E-06
      TotalSeconds : 0.0005494
      TotalMilliseconds : 0.5494

      Delete
  5. I added a couple of links at the bottom of the post. The Scripting Guy Ed Wilson and Lee Holmes wrote some nice articles about the subjet, hope this help!

    ReplyDelete