Azure Archive Storage and Blob-Level Tiering
Azure Blob Storage
The first thing you should know is what a blob is. I first heard the term “blob” in database theory back in college. We were educated that a blob was a file that was stored in a database. You can think of an Azure storage stamp as a massive, resilient database cluster. When we store data in a storage account, Azure figures out how we are going to use that data and stores (and charges for it) appropriately. One of those kinds of storage is a blob, which is a file.
Azure has two kinds of storage account:
- General Storage Account: This is the storage account that IaaS users know best, capable of storing disks (Page Blob and Disk), blobs (Standard IO – Block Blob), queues, file shares (for simple legacy apps), and tables.
- Blob Storage Accounts: Two tiers, hot and cold, were available for storing blobs only. The cold tier is economical if you access blobs less than twice-per-month and the hot tier is cost-effective if you access blobs more frequently.
When the hot and cold tiers were announced, the public responded with, “That is great but have you got something like Amazon Glacier?” Cool storage costs $0.01 per GB in the East US 2 Azure region. If you have petabytes of archive data that you need to keep but rarely access, even that paper clip cost can build up to be significant.
Archive Blob Storage
A new form of blob storage was announced, offering a third tier, below hot and cool blob storage. The idea behind archive storage is that it is ultra-cheap for huge amounts of data that you very rarely need. Microsoft made this possible by using some form of offline storage. When you try to access cool, hot or general storage blobs, the latency is unperceivable. You access the blob and the file is immediately available to you. In the case of archive storage, there will be a latency that is “on the order of hours“.
Note: I have no idea what storage system is being used for Archive Storage. That latency makes me think that it is some kind of tape storage, kept probably in triplicate tapes/libraries.
That sort of latency is okay. Realistically, any requirement for this old data is not immediate. It might be something, such as a court-issued subpoena to retrieve data from several years ago and such requests can be satisfied in days/weeks.
Archive storage has some interesting traits:
- Seamless: Archive storage is another form of blob storage, so they will be familiar to developers. The obvious exception is that code should understand that there will be several hours of latency to access files.
- SLA: The archive tier will have the same 99 percent availability service-level agreement that the cool storage tier has now when the archive tier reaches general availability.
- Replication: Archive data will be stored with the same resilience options as other blob storage options.
- Encryption: This is an expanding trend in Azure. All data will be encrypted automatically at rest.
The most important trait of archive storage is the price. The preview price of archive blob storage in East US 2 is $0.0018. This is versus $0.01 for cool blob storage in the same region. 1 terabyte (TiB) will cost just $1.84 per month! 1 petabyte (PiB) will cost $1,887.44 per month!
Those of you working with on-premises tiered storage might wonder if Microsoft is going to work on tiering blobs. In other words, can a blob be moved to an appropriate tier? The answer is yes but what that “yes” means will change over time.
Today in the limited preview, you can move a blob from one tier to another. You can open a blob and select a tier for that blob: hot, cool, or archive.
Today, that means that either you or some software that you use/write must track the usage/age of a blob and move it to the appropriate tier.
What about auto-tiering? That would definitely be popular and Microsoft knows that. Back at Build, the Azure storage team announced that auto-tiering would come after general availability of Archive Storage and Blob-Level Tiering.
At the moment, availability of the program is limited to approved applicants. Regional and storage availability is also limited:
- Any new LRS blob storage account in East US 2 will have the archive tier option.
- All new accounts in all public regions have blob-level tiering.
- In the preview phase, only LRS will be supported. GRS and RA-GRS will be available at GA.