Dealing with Protected Documents Found by Office 365 Content Searches

Content Searches Office 365

DSRs, Content Searches, and Office 365

In May, I wrote about the new feature in the Office 365 Security and Compliance Center designed to handle GDPR article 15 Data Subject Requests (DSRs). A DSR is a special form of Office 365 eDiscovery case that depends on content searches to find content relating to a named individual (the data subject).

Microsoft estimates that 90% of an organization’s data stored in Office 365 is in Word, Excel, PowerPoint, OneNote, or email, all of which is indexed and searchable. It’s therefore reasonable to assume that a DSR will find anything held in Office 365 relating to a named individual. Well, that’s certainly the hope, but as pointed out in the article, some repositories like Yammer, Sway, and Planner, might hold information needed to satisfy a DSR.

Scanning an Office 365 tenant with a content search is a good first step to responding to a DSR. Interpreting the results and figuring out what relates to the data subject create some added challenges. Some challenges take time to resolve, such as examining found items to ensure that personal data belonging to the data subject exists. Others need a more technical solution.

Rights Management More Popular in Office 365

Dealing with protected documents fall into the latter category. Because it’s much easier to manage online, Azure Information Protection (aka rights management) is used more heavily inside Office 365 than it is on-premises. Microsoft has also made it easier for users to apply protection to email, including a new mail encrypt feature shared with premium versions of Outlook.com. The upshot of this activity is that an increasing volume of Office 365 data is encrypted.

Finding Protected SharePoint Content

To satisfy a DSR, we must find all content relating to a data subject. Exchange indexes the content of protected email, so a content search finds those messages without difficulty. The problem lies in documents stored in SharePoint Online and OneDrive for Business sites because Office 365 does not index the content of these files. The metadata is indexed and can help to find documents, but the content is hidden through encryption.

Searches against SharePoint and OneDrive sites will find protected files if the search keywords match metadata like a document subject. But there’s no point in giving an eDiscovery investigator a set of protected documents that they cannot read, which means that we need some way to decrypt protected documents found by content searches.

Dealing with Protected Documents

Let’s assume that the results of a DSR search includes some protected documents. When you export the results, you nominate a target folder to receive copies of the found files. Under the target folder, a folder (named after the date and time of the search) holds the search results, including the export summary, manifest, and a folder called SharePoint. Inside the SharePoint folder are folders for each site where the search found something, and folders navigating to the point in the site where the search found the content. The path to such a folder might be something like this:

C:\Temp\Search for documents_Export\06.05.2018-1106AM\SharePoint\GDPR Planning Mark II\gdprplanningmarkii\Shared Documents\General

Figure 1 shows a typical example. In this case, the export for a content search includes a protected Word document (which is why preview doesn’t work). If someone without permission tries to open the document, they’ll be blocked.

Protected Word document
Figure 1: A protected Word document found by a content search (image credit: Tony Redmond)

When preparing files for processing, it’s probably a good idea to copy all the protected documents into a single folder to make it easier to remove protection.

Superusers and PowerShell

After assembling the files, we can go ahead and remove protection using a combination of a rights management superuser account and the Azure Information Protection PowerShell module. A superuser can remove protection from any file, while the PowerShell module enables us to automate bulk removal of protection from files found in a search.

Because superusers can read any protected file in a tenant, this permission should only be assigned on an as-needs time-limited basis.

Removing Protection

The code we need is straightforward. In this example, we define a target folder, make a collection of the files in the target folder, and then loop through the collection to look for protected documents. If we find any, we remove the protection. The first two commands import the Azure Information Protection module into the session and the connect to the online service. You can only connect to Azure Information Protection if you are a tenant administrator or have been assigned the Azure Information Protection administrator role.

Import-Module Aadrm
Connect-AadrmService

$TargetFolder = "C:\Temp\Search for documents_Export\06.05.2018-1106AM\SharePoint\GDPR Planning Mark II\gdprplanningmarkii\Shared Documents\General"
$Documents = Get-ChildItem -File $TargetFolder
ForEach ($D in $Documents) {
   $ProtectStatus = Get-AipFileStatus -Path $D.FullName
   If ($ProtectStatus.RMSTemplateId -ne $Null) {
      Write-Host $D.Name "is protected with" $ProtectStatus.RMSTemplateName
      $Message = $ProtectStatus.RMSTemplateName + " removed for GDPR DSR"
      Set-AipFileLabel -Path $D.FullName -RemoveLabel -JustificationMessage $Message  }
}

APC123.docx is protected with Intellectual Property

SPO Protected Content Test.docx is protected with Patent Submission
FileName
--------
C:\Temp\Search for documents_Export\06.05.2018-1106AM\SharePoint\GDPR Planning Mark II\gdprplanningmarkii\Shared Documents\General...
C:\Temp\Search for documents_Export\06.05.2018-1106AM\SharePoint\GDPR Planning Mark II\gdprplanningmarkii\Shared Documents\General...

After protection is removed from the documents, they can be dealt with like any other file. In most cases, an investigator will open and review the content to see whether it is of interest, such as helping to satisfy a DSR.

Solution, But No Logging

This is very rough-and-ready code and a production-quality script will include error checking, logging, and so on. One surprising fact that I encountered while researching is article is that the Azure Information Protection usage or admin logs do not capture any record for a superuser removing protection from a file. Instead, the Azure Information Protection captures a Windows event (104) in the local workstation’s event log. That’s simply not good enough in an era when audit logging of administrator actions is becoming increasing important.

Follow Tony on Twitter @12Knocksinna.

Want to know more about how to manage Office 365? Find what you need to know in “Office 365 for IT Pros”, the most comprehensive eBook covering all aspects of Office 365. Available in PDF and EPUB formats (suitable for iBooks) or for Amazon Kindle.