PowerShell Problem Solver: Importing CSV Files into PowerShell

I see a lot of posts, questions and scripts that involve importing a CSV file into PowerShell. On one hand, working with a CSV file couldn’t be easier. On the other hand, PowerShell will happily turn any imported CSV file into a collection of objects, where working with those objects can be a little confusing if you don’t completely understand the process. I think part of the problem PowerShell beginners face is that they are still thinking about text, especially when importing a CSV file. Let me show you what I’m talking about.

Here’s a sample CSV file that includes test data I want to use in PowerShell.

"Date","Name","Service","Key","Size","Test"
"11/18/2014 6:45:30 AM","Data_1","ALG","","25","false"
"11/18/2014 2:17:30 AM","Data_2","AppIDSvc","2","-30","false"
"11/17/2014 11:57:30 PM","Data_3","Appinfo","3","42","true"
"11/17/2014 11:21:30 AM","Data_4","AppMgmt","4","-7","true"
"11/17/2014 11:33:30 AM","Data_5","aspnet_state","","28","true"
"11/16/2014 8:51:30 AM","Data_6","AudioEndpointBuilder","6","35","false"
"11/16/2014 6:00:30 PM","Data_7","AudioSrv","7","12","false"
"11/16/2014 3:41:30 AM","Data_8","AxInstSV","8","","true"
"11/16/2014 11:39:30 AM","Data_9","BDESVC","9","32","true"
"11/16/2014 11:03:30 PM","Data_10","BFE","10","15","false"
"11/18/2014 4:03:30 AM","Data_11","BITS","11","28","true"
"11/15/2014 7:57:30 AM","Data_12","Browser","12","31","true"
"11/14/2014 12:23:30 PM","Data_13","bthserv","13","16","true"
"11/17/2014 10:49:30 PM","Data_14","CertPropSvc","14","-39","false"
"11/14/2014 5:33:30 AM","Data_15","cfWiMAXService","15","20","true"
"11/13/2014 12:13:30 AM","Data_16","clr_optimization_v2.0.50727_32","16","42","true"
"11/15/2014 10:54:30 PM","Data_17","clr_optimization_v2.0.50727_64","17","28","true"
"11/16/2014 11:57:30 PM","Data_18","clr_optimization_v4.0.30319_32","18","-45","true"
"11/14/2014 2:38:30 AM","Data_19","clr_optimization_v4.0.30319_64","19","24","true"
"11/11/2014 11:13:30 AM","Data_20","cmdAgent","20","44","true"
"11/16/2014 1:33:30 AM","Data_21","COMSysApp","21","24","true"
"11/17/2014 8:37:30 AM","Data_22","ConfigFree Gadget Service","22","41","true"
"11/11/2014 6:28:30 PM","Data_23","ConfigFree Service","23","33","true"
"11/13/2014 4:45:30 PM","Data_24","CryptSvc","24","47","true"
"11/9/2014 6:28:30 PM","Data_25","CscService","25","26","true"

I’ll import once and save the results to a variable so I don’t have to keep re-importing the data.

$data = import-csv c:\scripts\testdata.csv

As you can see, each value in the CSV header became a property name.

Each value in the CSV header becomes a property name. (Image Credit: Jeff Hicks)
Each value in the CSV header becomes a property name. (Image Credit: Jeff Hicks)

I recommend you make sure the column entries in the header line do not have any spaces. It isn’t illegal, but it makes working with the resulting properties a bit more difficult as you have to quote the property name.

Next, let’s say I want to get results where the size is less than 0. Looking at the output, you might think this will work:

$data | where {$_.size -lt 0}

Given the first screenshot you would expect to see at least Data_2 with a size value of -30. But this isn’t the case.

Grabbing the size of $data in PowerShell. (Image Credit: Jeff Hicks)
Grabbing the size of $data in PowerShell. (Image Credit: Jeff Hicks)

The only result is an object with nothing defined for the size property. When you face this situation, this is where Get-Member is useful. Exactly what type of objects are in $data?
Using Get-Member in Windows PowerShell. (Image Credit: Jeff Hicks)
Using Get-Member in Windows PowerShell. (Image Credit: Jeff Hicks)

Notice that all of the properties are treated as strings. This means that any operation that relies on something other than a string will fail to run properly.
The date sort has failed. (Image Credit: Jeff Hicks)
The date sort has failed. (Image Credit: Jeff Hicks)

The date sort is wrong, and I shouldn’t be seeing anything with a test value of false. One way around this issue is to tell PowerShell to treat properties as a certain type. Let’s go back to my first filtering expression.

$data | where {$_.size -as [int] -lt 0} | format-table

I’m using the –as operator and telling PowerShell to treat the size property as an integer. Now I get the result I expect.

Using the -as property in Windows PowerShell. (Image Credit: Jeff Hicks)
Using the -as property in Windows PowerShell. (Image Credit: Jeff Hicks)

You have to do this sort of type conversion whenever you are doing something with the property. However, using –as doesn’t always work. If you try this with my test data, then you’ll see that it won’t work.

$data | where {$_.Test -as [boolean]} | format-table

An alternative is to use one of the built-in methods to convert the value.

Using the System.IFormatProvider method in Windows PowerShell. (Image Credit: Jeff Hicks)
Using the System.IFormatProvider method in Windows PowerShell. (Image Credit: Jeff Hicks)

You’ll see that all of these methods require a parameter value, which helps PowerShell determine how to format the value. For our purpose, this System.IFormatProvider method comes down to two choices: a number or data.

[System.Globalization.NumberFormatInfo]::CurrentInfo
[System.Globalization.DateTimeFormatInfo]::CurrentInfo

The easiest way is to reference them directly with the .NET Framework. We already know that the size property will be treated as a string. But with a little help, we can tell PowerShell to treat it as an integer.

Telling Windows PowerShell to treat size value as a integer. (Image Credit: Jeff Hicks)
Telling Windows PowerShell to treat size value as a integer. (Image Credit: Jeff Hicks)

Now I can run a command like this:

$data | where {$_.Test.Toboolean([System.Globalization.NumberFormatInfo]::CurrentInfo)} |
sort {$_.Date.todateTime([System.Globalization.DateTimeFormatInfo]::CurrentInfo)} -descending |
format-table

I will admit this is a bit advanced, but it works.
120814 2032 PowerShellP8
It is important for you to remember that you haven’t changed the underlying objects in $data, only how they are being processed in your PowerShell expression.

As you can see, I still have string properties.

String properties in Windows PowerShell. (Image Credit: Jeff Hicks)
String properties in Windows PowerShell. (Image Credit: Jeff Hicks)

One way to avoid all of this is to give your newly imported objects the proper type from the very beginning. Allow me to share an advanced function of mine called Import-TypedCSV.

#requires -version 3.0
Function Import-TypedCSV {
<#
.Synopsis
Import a CSV file with typed property values.
.Description
PowerShell will happily import any CSV file and turn it into an object. Unfortunately, PowerShell will treat everything as a string. This command will attempt to dynamically assign a type to each imported property.
If a property is missing a value you will be prompted to provide one.
.Example
PS C:\> Import-TypedCSV c:\work\data.csv | sort starttime -Descending |  Select Name,StartTime,VM | format-table
Name            StartTime                     VM
----            ---------                     --
thunderbird     12/4/2014 4:22:50 PM   640290816
powershell_ise  12/4/2014 1:42:36 PM  1101860864
SnagitEditor    12/4/2014 8:05:33 AM   522137600
waterfox        12/4/2014 8:05:32 AM -1445900288
Dropbox         12/4/2014 8:05:26 AM   356761600
Snagit32        12/4/2014 8:05:23 AM   466067456
SugarSync       12/4/2014 8:05:18 AM   321622016
googledrivesync 12/4/2014 8:05:17 AM   213491712
BoxSync         12/4/2014 8:05:11 AM   880230400
explorer        12/4/2014 8:04:53 AM  1332060160
Import-CSV would have treated StartTime as a string which would have resulted in an inaccurate sort.
.Example
PS C:\> Import-TypedCSV c:\work\data.csv | where {$_.vm -ge 800mb}
Name      : powershell_ise
Path      : C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell_ise.exe
StartTime : 12/4/2014 1:42:36 PM
VM        : 1101860864
WS        : 212324352
Name      : BoxSync
Path      : C:\Program Files\Box\Box Sync\BoxSync.exe
StartTime : 12/4/2014 8:05:11 AM
VM        : 880230400
WS        : 134922240
Name      : explorer
Path      : C:\windows\Explorer.EXE
StartTime : 12/4/2014 8:04:53 AM
VM        : 1332060160
WS        : 118648832
.Example
PS C:\> dir c:\work\report5.csv | import-typedcsv | sort lastmodified -Descending | format-table
Modified              Name                          Path                                          Size Type
--------              ----                          ----                                          ---- ----
10/31/2014 8:43:03 AM databits.txt                  C:\work\databits.txt                       2282783 .txt
8/26/2014 8:55:56 AM  nwork.xml                     C:\work\nwork.xml                          3053890 .xml
8/8/2014 4:30:36 PM   modules.xml                   C:\work\modules.xml                        9161022 .xml
6/5/2014 5:37:14 PM   LWAPlugin64BitInstaller32.msi C:\work\foo\LWAPlugin64BitInstaller32.msi  7622656 .msi
.Notes
Last Updated: December 4, 2014
Version     : 2.0
Learn more about PowerShell:
Essential PowerShell Learning Resources
.Inputs [string] .Outputs [PSCustomobject] .Link Import-CSV #> Param ( [Parameter(Position=0,Mandatory=$True,HelpMessage="Enter the CSV filename and path", ValueFromPipeline,ValueFromPipelineByPropertyName)] [ValidateScript({Test-Path $_})] [Alias("PSPath")] [string]$Path, [ValidateNotNullorEmpty()] [string]$Delimiter="," ) Begin { Write-Verbose -Message "Starting $($MyInvocation.Mycommand)" Write-Verbose "Using delimiter: $delimiter" } #begin Process { Write-Verbose "Importing $path" #import the data $data=Import-Csv -Path $path -delimiter $delimiter #get the first element Write-Verbose "Returning the first item out of $($data.count)" $item=$data[0] #get the property names $properties=$item | get-member -MemberType NoteProperty | Select -ExpandProperty Name Write-Verbose "Found $($properties.count) properties" #Define some regular expression patterns to identify types based on values $datetime="^\d{1,2}/\d{1,2}/\d{2,4}|^\d{1,2}:\d{1,2}:\d{1,2}" $number="^\d+$|^-\d+$" $double="^\d+\.\d+$|^-\d+\.\d+$" $boolean="^True$|^False$" #define a variable for the custom properties $custom=@() foreach ($property in $properties) { $value=$item.$property #cast each $property specifically as a string and pull out any spaces [string]$property=$property.Replace(" ","") Write-Verbose ("Testing property {0}. Sample value is {1}" -f $property,$value) if ($value) { Switch -regex ($value) { $datetime { Write-Verbose "Treating as Datetime" $type="datetime" $exp=$ExecutionContext.InvokeCommand.NewScriptBlock("[$type]`$_.$property") } $number { #further test value to determine an appropriate class Write-Verbose "Testing $value" Switch ($value) { {$_ -as [int32]} { Write-Verbose "Treating as Int32" $type="Int32" Break } {$_ -as [uint32]} { Write-Verbose "Treating as Uint32" $type = "uint32" Break } {$_ -as [int64]} { Write-Verbose "Treating as int64" $type = "int64" Break } {$_ -as [uint64]} { Write-Verbose "Treating as Uint64" $type = "uint64" Break } Default { #treat as a string if nothing matches Write-Verbose "Can't determine numeric value." $type = "string" } } #switch <# old code if ($value -as [int32]) { Write-Verbose "Treating as Int32" $type="Int32" } elseif ($value -as [uint32]) { Write-Verbose "Treating as Uint32" $type = "uint32" } elseif ($value -as [int64]) { Write-Verbose "Treating as int64" $type = "int64" } elseif ($value -as [uint64]) { Write-Verbose "Treating as Uint64" $type = "uint64" } else { #treat as a double Write-Verbose "Treating as a Double" $type = "double" } #> $exp = $ExecutionContext.InvokeCommand.NewScriptBlock("[$type]`$_.$property") } $double { Write-Verbose "Treating as Double" $type = "Double" $exp = $ExecutionContext.InvokeCommand.NewScriptBlock("[$type]`$_.$property") } $boolean { Write-Verbose "Treating as a Boolean" $type = "boolean" #convert True and False strings into valid Boolean values $exp = $ExecutionContext.InvokeCommand.NewScriptBlock("if (`$_.$property -match 'True') {`$v=1} else {`$v=0} ; `$v -as [$type]") } Default { Write-Verbose "Treating as String" $type = "String" } } #switch } #if item.property else { #there must be an empty value in the first object so prompt for the type $msg = ("There was no value found for {0}. What type do you want to use? For example Int32, Datetime or String" -f $property) $type = Read-Host $msg $exp = $ExecutionContext.InvokeCommand.NewScriptBlock("if (`$_.$property) {[$type]`$_.$property} else {[null]`$_.$property}") Write-Verbose "Treating $property as $type. User-provided value." } #add the property to the custom property array if ($type -eq "String") { $custom+=$Property } else { #create a scriptblock from the property name $custom+=@{Name="$Property";Expression=$exp} } } #foreach #Now take data and 're-type' the objects $data | Select-Object -Property $custom } #process End { Write-Verbose "Ending $($MyInvocation.Mycommand)" } #end } #end function

Without going into extreme detail, I’ve tried to thoroughly document the function, where my command looks at the first line of data in the CSV file and attempts to determine the appropriate object type using regular expressions.

If any of the properties for the first object have no value, you will be prompted to specify the object type.

Prompting you to provide the object type in PowerShell. (Image Credit: Jeff Hicks)
Prompting you to provide the object type in PowerShell. (Image Credit: Jeff Hicks)

But now that everything is properly typed, using the data is much easier.
Properly typed objects in Windows PowerShell. (Image Credit: Jeff Hicks)
Properly typed objects in Windows PowerShell. (Image Credit: Jeff Hicks)

I do have yet one more alternative you might want to consider, especially if you often import the same type of CSV file repeatedly. It is definitely an advanced technique, which I’ll guide you through in a future article.