How to Query an Excel Workbook as a Database with PowerShell

I suppose I will simply have to accept the fact that IT pros want to store data in Excel workbooks and then access that data from PowerShell. Personally, I think this results in making their lives very difficult, as data can easily be stored in a CSV file. Importing a CSV file into PowerShell is trivial task, where you have the entire arsenal of PowerShell commands to use on that data. The typical alternative is to use the Excel COM object and programmatically walk through the spreadsheet. This is very tedious and isn’t I would wish on a PowerShell beginner. So, along the lines of “if you can’t beat ’em, join ’em” I have another alternative. If the data you need from an Excel workbook is laid out in a worksheet, then you might be able to query the Excel file as if it were a database. This article will show you how.

First, I have a sample spreadsheet.

$path = "c:\scripts\mydata.xlsx"

A sample Microsoft Excel spreadsheet. (Image Credit: Jeff Hicks)
A sample Microsoft Excel spreadsheet. (Image Credit: Jeff Hicks)

Although you don’t need to name the worksheet, mine is labeled “mydata”. My data also has column headings, which isn’t a requirement, but I expect this is the norm, so that’s what I’ll demonstrate. There are no built-in cmdlets for accessing this as a database, but you should be able to take the commands I’m going to show you and create your own tools and functions. By the way, my file was created in Excel 2013, and I expect my code will work for most recent versions. But you’re testing everything in a non-production environment anyway, right?
First, we need a connection object. Because we aren’t querying a SQL database, we’ll use and OLEDB connection.

$connection = New-Object System.Data.OleDb.OleDbConnection

To open the Excel file, we need a connection string. This is essentially a description of how to connect to a database. The following is the string that I created, which I modified from information I found on the excellent ConnectionStrings.com website.

$connectstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=$path;Extended Properties='Excel 12.0 Xml;HDR=YES;'"$connection.ConnectionString = $connectstring

The last part, HDR=Yes, indicates that my Excel file uses headers. With the property set, I can open the connection.

$connection.open()

If you’re not sure what the layout of your file looks like, then you can use PowerShell to discover the layout of your data with the connection object’s GetSchema() method.

$connection.GetSchema()

Using PowerShell's GetSchema() method. (Image Credit: Jeff Hicks)
Using PowerShell’s GetSchema() method. (Image Credit: Jeff Hicks)

You can get more detail by specifying a collection name.

$connection.GetSchema("tables")

Specifying a connection name by using PowerShell's GetSchema() method. (Image Credit: Jeff Hicks)
Specifying a connection name by using PowerShell’s GetSchema() method. (Image Credit: Jeff Hicks)

The table name is the worksheet label. We will need to know this when it comes time to querying the file. You can also discover the column names.

$connection.GetSchema("columns")

You will see an object for each column, most of which won’t have any values. PowerShell and the .NET Framework will do a pretty decent job of interpreting the data as the correct data type.

For our purposes, the only thing we likely need to know are the name and ordinal position, that is to say the column number. Let’s create a hash table of this information, which will come in handy later.

$connection.GetSchema("columns") |
foreach -Begin { $hash = @{} } -process {
  $hash.add($_.Column_Name,$_.Ordinal_Position)
}

This is what the hash table looks like:

Our hash table that we created in PowerShell. (Image Credit: Jeff Hicks)
Our hash table that we created in PowerShell. (Image Credit: Jeff Hicks) 

The value is the column number.
Next, we need an object to do something with the connection.

$cmdObject = New-Object System.Data.OleDb.OleDbCommand

There are a few properties we need to set, including connecting it to the file. But the most important is the query.

$query = "Select * from [mydata$]"
$cmdObject.CommandText = $query
$cmdObject.CommandType = "Text"
$cmdObject.Connection = $connection

The query is the same type you would use for SQL Server. In my example, I am selecting all properties from a specific table. The table in this case is my worksheet. Notice the name, mydata$, is the same as we saw in the schema query.
It’s finally the time for the moment of truth, and frankly the most complicated part of the entire process, where we’ll need to invoke the query. There are different methods depending on the type of query. Since we are simply reading, we will invoke the ExecuteReader() method. This will return another type of object called a DataReader.

$dataReader = $cmdObject.ExecuteReader()

The DataReader object has a few properties you can check to see if your query returned anything. The HasRows property should indicate the presence of records. FieldCount indicates the number of columns. Since we asked for everything we get all 18 columns.

Using the dataReader object in PowerShell. (Image Credit: Jeff Hicks)
Using the dataReader object in PowerShell. (Image Credit: Jeff Hicks) 

The tricky part is that you need to “read” each record in the DataReader object. Within each record, you can then do something with the data. The typical approach is to loop through the collection. The Read() method will return true if there is data to be processed.

While ($dataReader.Read()) {
  $datareader.item("Name")
  $datareader.Item("Company")
  $datareader.Item("Path")
}

Assuming you know the column name, you can easily retrieve the corresponding value.

There are other methods you can use if you need them. You can learn more about this class from the MSDN documentation.
The most important thing to realize about the data reader is that once you have read through everything, you can’t go back. Reading is a one-way street. If you need to process the data again, you’ll need to re-execute the query. I simply close the reader and re-read the data.

$datareader.Close()

I’m doing this because I don’t want you to think querying the file is simply returning a bunch of strings. Now, that may be all you need, but you can do so much more.

How about reconstituting the data in the spreadsheet as a collection of objects? My data, if you haven’t figured it out by now, is a collection of process information. I want to query the “database” for all processes with a working set value greater than 100MB.

$query = "Select * from [mydata$] where WS > $(100MB)"
$cmdObject.CommandText = $query

I’m using a substring for the numeric value so that PowerShell will expand 100MB into 104857600. Now to get the data.

$dataReader = $cmdObject.ExecuteReader()

Here’s the fun part. I want to get all of the columns, but I don’t want to have to remember them all, let alone type them. This is where that hash table of column information we created earlier helps. I can take this hash table, and use the key to retrieve the value for each column name. Because I want the property names to be in the same order as the spreadsheet, I will want to sort them.

Sorting on the hash table in Windows PowerShell. (Image Credit: Jeff Hicks)
Sorting on the hash table in Windows PowerShell. (Image Credit: Jeff Hicks) 

I’m going to read the data, and for each entry create a custom object, using an ordered hash table.

$data = While ($dataReader.Read()) {
#sort by column order
$hash.GetEnumerator()| sort Value |
foreach -begin {$drHash=[ordered]@{} } -process {
$drhash.Add($_.key,$datareader.item($_.key))
} -end { New-Object PSObject -property $drhash }
}

The ordered hash table will use the key from the column hash table as its key, and the corresponding data value for its value. I’m saving all of the output to $data.

I now have a collection of objects I can use. And these objects have properties that aren’t simply strings. Piping $data to Get-Member shows what I’m talking about.

The end result of all of this is I that I can store data in Microsoft Excel files and query that data, treating the workbook as a database. I’ve walked through the process, but you will most likely want to create your own scripts or tools. Don’t forget to close and clean up after yourself.

$datareader.Close()
$cmdObject.Dispose()
$connection.close()
$connection.Dispose()

Is this something you think you’d use? Is there something else you need to know about using Excel as a database? If so, please leave a comment.