The Architecture of Azure File Sync
In this post, I will explain how the components of Azure File Sync fit together and how they work together in an example multi-site scenario.
Azure File Sync
Microsoft’s Azure File Sync (in preview at the time of writing) is an Azure subscription service that enables:
- File servers to sync folders/files to Azure storage accounts in the Azure Files service.
- Optionally tier storage, keeping hot files on-premises and cold files in the cloud without changing the folder structure that a user sees. This effectively turns the file server into a proxy for the master copy in Azure.
- Synchronize (without synchronized file lock at this time) folders/files between file servers in different locations.
- Move the process of backup to Azure. Azure Backup support is in private preview at the time of writing.
- Simplify the disaster recovery of file servers. You connect a new file server (proxy) to the master copy of the data in Azure and users can start accessing folders/files within minutes.
There are a number of components that make all this possible. To aid with the explanation of this, I will use a theoretical scenario with three file servers running in Dublin, Stockholm, and Frankfurt.
- Synchronize all shares to Azure.
- The Accounting share will be on all file servers.
- The Sales share will be in Dublin and Frankfurt.
- The Marketing share will be in Stockholm and Frankfurt.
- Tiering will be enabled for each server endpoint to keep only hot data on each file server.
The architecture of Azure File Sync is designed to be flexible. File shares are old resources in companies and typically the content drives a business. I once worked in a bank where auditors found that over 55 percent of critical business data was in the form of spreadsheets! How this data is used and disseminated rarely fits into rigid rules, so Azure File Sync has to be able to bend with the requirements of the business. The componentized architecture suits this requirement.
General Purpose Storage Account
The key service in Azure File Sync is the storage account. This storage account, a general purpose storage account, will host the Azure Files shares that will be used to store the master copy of the file server(s) data. Note that Blob storage accounts do not offer the required Azure Files service, which is where data will be stored.
In my scenario, a General Purpose v1 storage account is deployed to West Europe:
- It is close to all the file servers.
- General Purpose v1 is cheaper (transaction costs) than General Purpose v2. We cannot take advantage of Blob Tiering (the key feature of GPv2) because we are using Azure Files instead of Blob storage.
Azure Files Shares
The Azure Files service is used to store data in the general purpose storage account. An Azure Files share is created for each folder or sync group that you synchronize to Azure.
Note that the (current) maximum size of a single share is 5TiB. That doesn’t mean you can only synchronize 5TiB of data to Azure; you can have many sync groups, each synchronizing a folder of data to Azure. Note that an Azure File share doesn’t necessarily correspond to a share in Azure. An Azure File share could be:
- Paired to a file server folder that contains many shared folders
- Corresponding to a folder within a file share
- Or simply a synchronized copy of a shared folder from a file server
In the above example, each file share on the file servers will map to a share in Azure Files.
Storage Sync Service
The Storage Sync Service is what you deploy in Azure to create an Azure File Sync solution; it is here that you will manage the Azure File Sync.
File Sync Agent
Azure File Sync is a non-disruptive service. A solution such as StorSimple requires that you move data to a new storage system. Azure File Sync, however, does not require you to move data; instead, you will deploy an agent to the existing file server(s) and the agent will add cloud functionality to the file server.
The File Sync Agent is installed on each of the file servers in Dublin, Stockholm, and Frankfurt.
After you install the File Sync Agent, you will register the file server(s) with the Storage Sync Service. Once this authenticated process is completed, the Storage Sync Service can start to manage storage on the file server.
A sync group allows you to select a folder that will be synchronized to Azure. The sync group will have 1 cloud endpoint (synchronizing with Azure Files) and 1 or more server endpoints (file servers).
In my example, I will have three sync groups, each replicating a folder from the file servers:
When you create a sync group, a cloud endpoint is created automatically for you. Each cloud endpoint has an associated share in Azure Files. For example, the Accounting sync group will have a cloud endpoint or Azure Files share called Accounting.
When you synchronize a folder to Azure, the master copy becomes the cloud endpoint. From then on, you should consider your file servers to be hot local replicas of the Azure master copy.
Each folder that is synchronized with Azure is referred to as a server endpoint. Each server/folder combination that you add to a sync group is referred to as a Server Endpoint. If I add an existing D:\Marketing from the Dublin file server to an Accounting sync group, then this is my first Server Endpoint. I can deploy additional server endpoints to the Accounting sync group; this will result in each added server endpoint replicating from/to the master copy Cloud Endpoint.
Note that the server endpoints don’t require matching file paths. You can also have different server endpoints on each file server. For example:
- The Dublin file server can sync D:\Accounting with D:\Accounting in Stockholm and F:\Accounting in Frankfurt.
- The Stockholm file server can sync D:\Marketing with F:\Marketing in Frankfurt, with no replica in Dublin.
You can optionally enable a unique tiered policy for each server endpoint. Once you enable a policy, the files are removed from the server endpoint. The files still reside in Azure Files. The removed files are replaced by metadata with an offline attribute (O). Other than this attribute and a different icon, users will have no idea that the files are stored in the cloud; the location of the files in the file system remains the same, permissions are unchanged, and names are not modified. With this system you can:
- Configure no tiering at all
- Enable tiering policies for some server endpoints to use very little local storage
- Configure other endpoints to use lots of local storage
In other words, each server endpoint can be configured based on the usage and hardware traits of each share/server/office combination.
File Server Antivirus
It is normal to want to run anti-malware on your file servers. However, you have to be careful because some malware will want to download tiered files to scan them. Make sure that your antivirus scanner will avoid scanning files with an “offline” attribute. Some products that Microsoft has identified are:
- Symantec Endpoint Protection
- McAfee EndPoint Security
- Kaspersky Anti-Virus
- Sophos Endpoint Protection
- TrendMicro OfficeScan
Snapshots and Azure Backup
The final piece of the puzzle will be backup. A local backup solution will download files back to the file server if you are using tiered storage. One of the benefits of Azure File Sync is that you can move the backup process to Azure.
Every file in a sync group is stored in Azure. You can centrally back up all of your files using backups in the cloud. Snapshots can be made in a storage account. At a later point, Azure Backup will be adding functionality to orchestrate the process.
It sounds like there are a lot of pieces but once you use Azure File Sync, you’ll find that the pieces fall together naturally. It is difficult to balance between simplicity and flexibility but in my opinion, I think Microsoft has managed to accomplish this.