How to Query the Windows Search Index using SQL and PowerShell

Posted on April 20, 2015 by Russell Smith in Windows Server with 0 Comments

In this article we’ll discuss how to query the Windows Search index using SQL, and I’ll also provide steps on how to generate reports using Windows PowerShell.

In a previous article, Manage Documents with Windows Explorer using File Properties and Tags on the Petri IT Knowledgebase, I walked readers through a simple system I’d devised for tracking the status of Word documents in File Explorer by tagging files, adding information to file properties, and searching using Windows Search Advanced Query Syntax (AQS). The aim was to track documents created within a specific calendar period, based on simple complete or not complete criteria, to make planning my schedule easier. I also wanted to see which documents were left unfinished.

The goals were achieved using technologies built-in to Windows 8, namely File Explorer and Windows Search Advanced Query Syntax, and the extended file properties that Microsoft Word brings to .docx files. In this article, I want to look at how to achieve those same goals programmatically with PowerShell. Don’t forget that if you want to use Windows Search on Windows Server, it is not installed by default.

Using PowerShell to Query the File System

In the original article, I queried Windows Search using its Advanced Query Syntax to find files by date created, a file system property inherent to Windows, and secondly by the value of content status, a property for Microsoft Word documents.

PowerShell’s Get-ChildItem Cmdlet

PowerShell supports querying the file system directly, so you can search file system properties built-in to Windows, such as date created, using the get-childitem cmdlet. Just provide the cmdlet with a valid file path, which can be either a directory or filename. Open a PowerShell command prompt and type the following two lines of code, replacing “c:\users\russell\skydrive\documents\petri\” with the file path you want to query.

The cmdlet will traverse through each directory in $filepath and report information about what it finds, including file attributes, such as whether the archive bit is set, file length, and date modified (LastWriteTime).

The output of the above command doesn’t show all the available properties. To see the full list for each file, add the select-object cmdlet as shown here:

Use PowerShell's get-childitem cmdlet to search a directory (Image Credit: Russell Smith)

Use PowerShell’s get-childitem cmdlet to search a directory (Image Credit: Russell Smith)

Now the command output is likely to contain too much information, but you can look through the properties shown for each file, decide what you want to include in the output and add the properties to the select-object cmdlet. In the example below, I only want the file path, date modified and extended attributes.

Format the output of PowerShell's get-childitem cmdlet (Image Credit: Russell Smith)

Format the output of PowerShell’s get-childitem cmdlet (Image Credit: Russell Smith)

Once you’ve written your query, you can send the result to PowerShell’s grid view to make it easier to view and sort the results. Note that you can’t use the format-table cmdlet with grid view.

It’s also possible to use the where-object cmdlet to add criteria to the search. In the example below, I use it to display only folders, by returning results where psiscontainer is set to true, effectively filtering out all files from the search.

Show only file in the results of PowerShell's get-childitem cmdlet (Image Credit: Russell Smith)

Show only file in the results of PowerShell’s get-childitem cmdlet (Image Credit: Russell Smith)

Finally, the following command uses the –filter parameter to show just Word documents, and I’ve added a date range to display files with lastwritetime dates between September 1 and October 31, 2014 using the greater (-ge) and less (-le) than or equal to operators.

Add a date range to PowerShell's get-childitem cmdlet and output the results to grid view (Image Credit: Russell Smith)

Add a date range to PowerShell’s get-childitem cmdlet and output the results to grid view (Image Credit: Russell Smith)

Sponsored

Using PowerShell to Run SQL Queries Against the Windows Search Index

If you’ve been looking through the file properties displayed by the get-childitem cmdlet carefully, you will have noticed that file properties unique to Microsoft Word documents, such as content status and comment, are not included in the results. These properties are part of the document and not the file system, therefore can’t be retrieved using get-childitem or get-itemproperty.

As we saw in the original article, these extended file properties are indexed by Windows Search, so we can search, sort and group files by content status and comment in File Explorer. PowerShell doesn’t currently include a cmdlet for directly querying the Windows Search index, but nevertheless it’s quite easy to query Windows Search using PowerShell. It’s also possible to use the Word COM object and access .docx file properties directly, but for our purposes it’s faster to run queries using Windows Search.

SQL Queries

As the Windows Search index is an SQL database, we need to construct an SQL query. Don’t worry if you don’t have any SQL experience. The queries I’m going to show you are very simple and don’t differ much from the syntax used with PowerShell’s where-object and select-object cmdlets.

In the SQL query below, I’m selecting two columns to display in the results, System.ItemName and System.ItemUrl, from a table called SYSTEMINDEX, which is the Windows Search index. WHERE adds criteria so that only files with the content status property set to Complete are shown in the results, and SCOPE recursively searches the petri folder and all subfolders. If I wanted to search only the petri directory and ignore any subdirectories, I’d use DIRECTORY instead of SCOPE.

SELECT System.ItemName, System.ItemUrl FROM SYSTEMINDEX WHERE System.ContentStatus = ‘Complete’ AND SCOPE = ‘c:\users\russell\skydrive\documents\petri\’

Now I need to build some PowerShell code to connect to the Windows Search database and run the above query.

Run an SQL query in PowerShell against the Windows Search index database (Image Credit: Russell Smith)

Run an SQL query in PowerShell against the Windows Search index database (Image Credit: Russell Smith)

The Object Linking and Embedding Database (OLEDB) COM object in Windows allows us to connect to compatible data sources. If you’ve ever set up a connection to a Microsoft Access database, you might be familiar with OLEDB connections. The $connector variable creates a new OLEDB object and passes on the $provider, i.e. which database the information will come from, and SQL query ($sql) information. Secondly, I create a new $dataset object that gets populated with the results of our SQL database query.

I need some additional information though, so I’m going to add the author, and comment properties to the SQL query:

I also want to add a date range to the query. To see all files that were created on October 2, I need to add a range that displays all files created from midnight on October 2 until midnight on October 3.

In the code below, I’ve added the select-object and format-table cmdlets to tidy up the output, and updated the SQL query to expand the date range to include all documents created in September and October.

Add a date range to the SQL query and format the output (Image Credit: Russell Smith)

Add a date range to the SQL query and format the output (Image Credit: Russell Smith)

Sponsored

Finally, here’s a SQL query that will show me any articles created in September and October that have a comment. All I did was remove WHERE System.ContentStatus = ‘Complete’ and replace it with WHERE System.Comment IS NOT NULL.

Using IS NOT NULL to return all documents that have a comment (Image Credit: Russell Smith)

Using IS NOT NULL to return all documents that have a comment (Image Credit: Russell Smith)

Now that you are able to build simple SQL queries to search the Windows Search index and populate a dataset, you can manipulate and format the results using PowerShell’s text handling capabilities.

Sponsored

Tagged with , , , ,