PowerShell Problem Solver: Parsing Text to Objects

Posted on August 5, 2015 by Jeff Hicks in PowerShell with 0 Comments

In a previous PowerShell Problem Solver article, I demonstrated how to use regular expressions to take text output and turn it into objects. Once you have objects, you can then take full advantage of PowerShell. I’ll admit that using regular expressions can be a bit vertigo-inducing, so let’s look at another approach.

Again, the assumption is that the text output is in a predictable and known format, where you don’t have any null or empty values. I’m going to use the same text file from my previous article that has RAID and disk information. See the previous article to see what it looks like.

I only want the last part of the file turned into objects.

Figure 1: Text output to convert. (Image Credit: Jeff Hicks)

Figure 1: Text output to convert. (Image Credit: Jeff Hicks)

The column headings will be the property names. Your column heading could be anywhere in the file, so the technique you use to get the line might vary. I’ll be using a simple pattern with Select-String.

Figure 2: Captured headings. (Image Credit: Jeff Hicks)

Figure 2: Captured headings. (Image Credit: Jeff Hicks)

Personally, I don’t like spaces or other characters in the property names, so I need to do a little cleaning with regular expressions. First, I’m going to replace any parentheses or carats with nothing.

Note that because the parentheses characters are special regular expression characters, I need to escape them with a backslash.

If I wanted to cover more possibilities, then I could use this type of pattern:

The pattern says to find anything that isn’t a space and isn’t a word character, and replace it with nothing. In either event, $h is now one step cleaner.

Figure 3: Removing extra characters. (Image Credit: Jeff Hicks)

Figure 3: Removing extra characters. (Image Credit: Jeff Hicks)

To get rid of the spaces, I need to turn to one more slightly advanced regex pattern.

This is tricky, because I want to leave the spaces between ID and Chassis, but remove them between RAID and ID. So I have to use something called a lookahead and lookbehind. The regular expression pattern is saying, if the current location is a single space, then look behind and see if it is a single non-space character and look ahead for another non-space character. If this is true, then replace the match with an underscore.

Figure 4: Removing spaces within property names. (Image Credit: Jeff Hicks)

Figure 4: Removing spaces within property names. (Image Credit: Jeff Hicks)

All that remains to build the list of property names is to split this on spaces.

Figure 5: The array of refined property names. (Image Credit: Jeff Hicks)

Figure 5: The array of refined property names. (Image Credit: Jeff Hicks)

You can certainly create the array of property names manually, especially if you want to use something other than the originals. Next, we need to use get-content to parse.

In much the same way as before, I’m going to turn each line into a separate object. To do that, I need to split each line into an array using the spaces as a delimiter. Next, I can loop through the list of property names and create a hashtable using the corresponding value from the split line. The hashtable is then easily turned into a custom object.

The end result is a collection of objects.

Figure 6: New objects displayed in a table. (Image Credit: Jeff Hicks)

Figure 6: New objects displayed in a table. (Image Credit: Jeff Hicks)

There is one potential drawback, where every property is a string.

Figure 7: Converted object properties are strings. (Image Credit: Jeff Hicks)

Figure 7: Converted object properties are strings. (Image Credit: Jeff Hicks)

One solution is to build a mapping hashtable.

Figure 8: Creating a Typename map. (Image Credit: Jeff Hicks)

Figure 8: Creating a Typename map. (Image Credit: Jeff Hicks)

The process to convert each line of text into an object is very similar to what I just showed you with the addition of converting each value into the necessary type.

I’ve inserted a switch construct based on the corresponding typehash entry. So if $i is 1, then $names[$i] is ‘Chassis’, which has a corresponding value of ‘int’.

Figure 9: Testing the type hashtable. (Image Credit: Jeff Hicks)

Figure 9: Testing the type hashtable. (Image Credit: Jeff Hicks)

In this case, the value will be converted to Int16. You can add other types or conversion commands as necessary. But once I run the text through, I now have property typed objects in $data.

Sponsored
Figure 10: Verifying property types. (Image Credit: Jeff Hicks)

Figure 10: Verifying property types. (Image Credit: Jeff Hicks)

Now commands like this will work properly.

Figure 11: Sorted and formatted results. (Image Credit: Jeff Hicks)

Figure 11: Sorted and formatted results. (Image Credit: Jeff Hicks)

Although I used more regular expressions in this article than I thought I would, most of these users were to select the text I wanted out of a larger file and to clean up names. If your text is simple and clean, all you need is a list of names, split each line into an array, and create a custom object joining names with values. In fact, the process can be very simple. Here’s command output that’s pretty close to complete.

Figure 12: A clean text file. (Image Credit: Jeff Hicks)

Figure 12: A clean text file. (Image Credit: Jeff Hicks)

I’m using a text file this could be the result of a running a command line tool. PowerShell has a cmdlet, ConvertFrom-CSV, which would easily turn this into a set of objects. The tricky part with the current output is that each entry is separated by a number of spaces and ConvertFrom-CSV is looking for a single character delimiter. Not a problem. I’ll replace the multiple spaces with a comma and then convert the result.

Figure 13: Using ConvertFrom-CSV. (Image Credit: Jeff Hicks)

Figure 13: Using ConvertFrom-CSV. (Image Credit: Jeff Hicks)

Sponsored

It can really be that simple. If you have any problems getting these techniques to work or have questions, please leave a comment.

Sponsored

Tagged with , ,