DIY PST Imports Using Azure Blob Storage
The Office 365 Import Service was introduced in a test version in late 2015 and became generally available in April 2016. It is used to import information into Office 365 in a variety of formats. The reason why the Import Service exists is simple. It is designed to ingest as much data as possible into Office 365 so that the information is indexed, discoverable, and available for compliance purposes.
A side-effect and benefit for Microsoft is that the data resides in its datacenters. Once inside Office 365, the cost and complexity of extracting the information again to bring it somewhere else means that the data is unlikely to leave. For example, if you move all of your email archives from Veritas Enterprise Vault into Exchange Online archive mailboxes, will you be in a hurry to move them again?
Microsoft provides tools to handle ingestion of PST files and SharePoint data. Plans originally existed to allow data from other sources such as Facebook, LinkedIn, and Bloomberg to be packaged and ingested using the Import Service. The current strategy to ingest non-Microsoft data focuses on the use of third-party connectors based on Exchange Web Services that establish links from other data sources to Azure, from where the data can be moved into archive mailboxes.
PST files are a popular source for the Import Service. I strongly support any effort to eradicate PSTs as I regard the file format to be obsolete, insecure, and prone to corruption and data loss when problems are encountered with PC hard disks. Although they are intended for “personal storage,” there’s no doubt that PSTs are used to hold all manner of important corporate information, all of which should be properly stored in a manner that is safe, secure, available to all devices, and compliant. (A free eBook about PST eradication that I edited is available.)
The Office 365 Import Service can import PST files into Exchange Online primary or archive mailboxes and files and documents can be imported to SharePoint Online sites. If you have a small amount of data to process, a network upload can be used to transmit the data to Azure while larger amounts of data can be dealt with by placing the data on SATA drives and sending them to a Microsoft datacenter for processing there.
Handling SATA drives and making sure that the chain of custody and customer privacy is preserved takes quite a lot of effort, which is the reason why Microsoft charges $2/GB to import data sent in via drive shipping. At that rate, a 4 TB drive packed with PSTs will incur a $8,000 processing charge, so it’s wise to avoid drive shipping if possible. Microsoft returns the drives after the data is uploaded to Azure.
Recently, an interesting blog by Joe Parachio explained how to use the New-MailboxImportRequest cmdlet running in an Exchange Online PowerShell session to import PST files that have been uploaded to Azure Blob Storage without using the Office 365 Import Service. The technique relies on the ability of the version of New-MailboxImportRequest to pass the URI to a storage location in the AzureBlobStorageAccountUri parameter and the security token necessary to access the location in the AzureSharedAccessSignatureToken parameter. The cmdlet, which was originally provided in Exchange 2010 SP to process PST imports for on-premises mailbox is available for both Exchange 2016 on and Exchange Online.
The technique outlined in the blog seems interesting, but you might then ask the very good question as to why it would be necessary to have a DIY approach for PST imports as the Office 365 Import Service does a perfectly capable job of ingesting PST content into mailboxes. Surely nothing additional is required?
It’s certainly true that the Import Service does a fine job of ingesting all matter of data into Office 365. However, the Import Service suffers in two respects. First, it doesn’t provide as much control over PST imports as can be exerted by using the parameters available for the New-MailboxImportRequest cmdlet to direct the Mailbox Replication Service (MRS) how to ingest data. The example given in the blog is how to exclude calendar folders from processing so that old and outdated entries don’t suddenly appear in user calendars.
Second, the reporting capabilities of the Import Service are weak and error handling isn’t great either. By their very nature, PSTs tend to be cluttered up with old, decaying, and barely readable items, some of which were created in the dim and distant past when clients weren’t too particular about how they used MAPI to create and update items. When PSTs that contain these kind of items are imported, the items are quite likely to be regarded as “bad” and are therefore dropped (not imported). The Import Service discards all bad items that it finds on the perfectly sensible basis that it does not make sense to import rubbish into Office 365.
Using the DIY approach enables you to set a threshold for the tolerance of bad items and also allows you to get a report about what bad items were dropped. Knowing what type of items were problematic might allow you to better prepare for future imports or help users understand the kind of problems that lurk in their PSTs, like old meeting requests that have been mangled by multiple clients (BlackBerry devices were particularly good at mangling items in their early days).
DIY imports are not for everyone, but they are a good example of the kind of operational flexibility that exists in Office 365 through PowerShell and a little bit of lateral thinking.
Follow Tony on Twitter @12Knocksinna.
Want to know more about how to manage Office 365? Find what you need to know in “Office 365 for IT Pros”, the most comprehensive eBook covering all aspects of Office 365. Available in PDF and EPUB formats (suitable for iBooks) or for Amazon Kindle.