PowerShell Problem Solver: PowerShell String Parsing with Named Captures and REGEX
Over the course of the last few PoweShell Problem Solver articles, I’ve been instructing you on the fine art of string surgery. That is, extracting bits of information from text using both substrings and regular expressions.
I have one more advanced technique I want to demonstrate. As before, I am working with this string sample:
My intention is to extract the parts of my name from the string. As I mentioned in a previous article, I've modified the last name to make it more challenging. I want to extract my first and last name and initial and have them easily accessible. I could go through the effort of breaking the string apart using techniques from my previous articles. Instead, I want to use a regular expression feature called named captures. With a named capture you can assign a name to a matching pattern. In PowerShell, you can easily reference this named capture and use it like a variable. As with all regular expressions, you have to know the structure of your data, and it must be consistent. To use named captures, you might want to use a REGEX object. Here's what my pattern looks like:
Although it isn't required, I have gone through and described the entire string. Let me break it down for you. First, I'm matching on the literal string "Mailbox" followed by a colon. Because the ':' character is a special character for regular expressions, I am escaping it so PowerShell treats it as a literal character. Now I come to the first named capture. The entire capture is enclosed in parentheses. The name for the capture is defined as ?<capturename>. Thus, the first capture is called GUID. Immediately after the name is a regular expression pattern. Everything that matches that pattern will be assigned to the named capture. To keep this simple, the pattern \S+ simply says take anything that isn't a white space. This is followed by a white space (\s). Next are named captures for the lastname and first name, separated by another space. The last part begins by escaping the next parenthesis. This initial named capture is a single word character. The pattern will ignore anything after that. Here's what happens when you match. You can see the captures, although it isn't apparent what is named and what isn't. If you forget the names you used, you can always check the REGEX object. The named captures are part of the Groups property. To make this easier to follow, let's save the match result to a variable.
There are a few ways to access the named captures. You could check the REGEX object for the corresponding group number to the named capture. Although personally, I find it easier to access a specific item in the collection like this:
I could create an additional variable for each named capture, or I might use them directly. Suppose I wanted to build a message for a log or a report. It is pretty straightforward to assemble using the –f operator.
The REGEX object is a bit more complicated to use, but it offers additional features you might want to take advantage of. That said, you can accomplish similar results using the –Match operator.
Remember, if this is true the built-in $Matches variable will hold the result. See the named captures? If you are going to use them, don't use –Match in another operation because that will overwrite this result. But look how easy this is to use: You should even get Intellisense for $matches to quickly expand the capture names. Using named captures in PowerShell may look daunting with all that punctuation. But the more you come to learn and understand regular expression patterns, the easier it becomes. And once you have named captures figured out, I think you'll find plenty of opportunities to put them to work.