Digging Into Azure VM Disk Performance Features

server disk

Disk Tier

The first, and most important, factor in disk performance is the tier of disk that you choose to create – remember that managed disks are easier to convert that un-managed disks.
The first tier is the Standard HDD. Microsoft says that this entry-level tier is best suited to:

… dev/test and other infrequent access workloads that are less sensitive to performance variability.

A certain amount of that quote is sales-driven and based on “ivory tower computing”. In reality, most of the workloads I see are running on Standard HDD because, just like in on-premises deployments, HDD is good enough and it’s also the cheapest form of machine storage. You should expect up to 60 MB/second throughput and 500 IOPS per disk, but that performance could be up and down as your virtual hard disks (VHDs) compete with other VHDs for seeks, reads and writes on the physical disk spindles.

Standard SSD is the next tier up from Standard HDD; the maximum throughput and IOPS remain unchanged but latency will be lower and the performance should be more smooth – a benefit of moving to flash storage.
Premium SSD offers lower latency than either of the Standard tiers. The IOPS and throughput do vary, depending on the size of the disk – bigger VHDs offer more performance. The largest support Premium SSD disk, the 4 TiB P50, offers up to 7,500 IOPS with 250 MB/second transfers.
Two things to note about Premium SSD:

  • Make sure that the virtual machine you deploy can support the IOPS and throughput of the Premium disks that you add to it. Larger machines offer more performance.
  • If your data workload does not generate a large queue for Standard HDD/SSD reads and writes then Premium SSD might not offer you any improvements beyond SLA and latency.

A fourth tier, Ultra SSD, is in private preview in limited regions today. Unlike the other tiers, Ultra SSD offers provisioned performance – you select from a range of possible performance levels when you create and size the VHD. Ultra SSD will offer up to 160,000 IOPS and 2,000 MB/second with sub-millisecond latency.

Caching

Host-based caching or “Blobcache” can improve the performance of reads or writes (IOPS, MB/second, and latency) – not just with Premium SSD as many Microsoft blog posts and documents suggest. Using host-based resources – flash storage and RAM – the host can greatly improve the performance of VHDs that are placed across the network on a storage cluster.
You have three options:

  • No caching: Some virtual machines, such as the Bs-Series, are limited to this option. Disk performance will be whatever the machine size can offer or whatever each disk can offer, whichever is lower.
  • Read caching: Read performance is improved by placing data into a host-based cache. Common reads will no longer require data transfers across the network from the storage cluster.
  • Read and write caching: This is the most dangerous form of cache. Writes go to the cache and are only sent to the disk when the host or a guest service request a flush or write-through. If a virtual machine moves to another host, the contents of the write cache are lost before they are committed to disk. Only enable this option when it is recommended by a guest service vendor.

An illustration of host-based caching with Premium SSD storage [Image Credit: Microsoft]
An illustration of host-based caching with Premium SSD storage [Image Credit: Microsoft]
Note that the OS disk has read/write caching enabled. This setting should not be modified. And this is why all data, including domain controller SYSVOL, database, and log files, should be stored on data disks.

Disk Aggregation

It makes me giggle to see that people think that a data volume must reside in just a single disk. That never limited volume sizes on physical on-premises storage so why should things be any different in a virtual machine? We can add multiple data disks to a single machine and pool those disks to create a single volume; the result is that we pool the capacity, but we also pool the performance of those disks, enabling near linear growth if you do it correctly. In the case of Windows Server, you should use Storage Spaces in the guest OS, configured with aligned interleaves and NTFS disk allocation unit sizes (normally 64 KB).
Note that in the SQL Server documentation Microsoft notes that you should not enable any caching on disks that are dedicated to log files because it could result in a minor decrease in disk performance.

Dedicated Data Volumes/Disks

Some data roles have very different read and write patterns. For example, in a SQL Server the log file is very write-intensive and serialized, but the database file is mostly read intensive and random.
It is recommended that for intensive workloads where you need the very best in performance, you place database logs onto their own disks (or Storage Spaces pool) and database files into their own disks (or Storage Space pool).

The Temp Drive

Every Azure virtual machine has a temp or ephemeral drive. This is where you can find the paging or swap file. You will also find a text file that tells you that this drive is temporary – if the virtual machine reboots it might get a whole new drive and any data on this disk will be lost.
Most virtual machines use host-based flash storage for the temp drive; this offers high IOPS and by being in the host it offers the lowest possible latency. You might have some benefits by moving the SQL Server TempDB to the temp drive.

Ls-Series

“L is for local storage”. The L-Series is an unusual virtual machine because all of the disks (OS, temp, and data) are on host-local flash storage, resulting in high IOPS and throughput as well as extremely low latency. If storage cluster VHDs are not good enough for you, then Ls-Series machines might be better.
Note that the disks are constrained to the machine so you will require service & data replication to other virtual machines to achieve high availability.

M-Series Write Accelerator

Write Accelerator is a feature that is available to managed disks that are deployed to M-Series virtual machines and can improve the performance of transactional updates that you would find in SQL Server, Oracle, and SAP HANA by offering sub-millisecond writes.