This post will explain some of the best practices for creating, configuring, and associating network security groups (NSGs) in Azure Resource Manager or CSP.
I have written two previous posts that will explain what NSGs are and how to deploy them:
In summary, a network security group is a policy that contains a collection of rules that block or allow traffic to and from a virtual NIC or a virtual network subnet in Azure. You can specify network addresses, locations, TCP/UDP/all protocols, and port numbers. Using a priority or weight value, you can stack rules. For example, a generic rule with a low value can block or allow everything, and more specific rules can override that, thanks to a higher priority.
Starting with a plan is a difficult conversation, especially when developers are granted the ability to deploy machines for themselves. There are those who think that devs are geniuses. In my experience, devs (and thus devops) are clueless when it comes to planning, security, and so on. But try, we must!
You need to plan out the infrastructure that you are going to deploy. This will impact your network design and your network security groups design. Document:
Are you planning on implementing a DMZ or a perimeter network? There are a number of ways to accomplish this, all of which leverage NSGs to block or allow traffic. Some designs rely entirely on NSGs to filter traffic based on protocol and port. Others add a next-generation firewall for application layer inspection/policies, and this can be extended with multiple virtual network subnets and user-defined routing.
You can associate an NSG as follows:
The best practice is to create 1 NSG per subnet and associate that NSG with only that subnet. The rule set within the subnet should affect the entire subnet and not just specific machines. Here is why:
If you do decide to be silly and create complexity, then you need to ensure that one rule type doesn’t block another. For example, if you block TCP 80 into a subnet, but allow it into a NIC, the NIC rule will never get a chance to be applied.
Inbound traffic is evaluated in this order:
Outbound traffic is evaluated in this order:
It is unlikely that you will think of every possibility when you are still in the planning stage. This next piece of advice isn’t constrained to the Microsoft world; you should increment the priority values of your rules by 100. For example:
Then when I test my service out and realize that I need a rule between 1 and 2, I can give it a priority of 150. This gives me further room to add further rules before and after the new rule. Note that you can use values from 1 to 4096.
If you are creating custom rules that are generic, such as block or allow all, you might want to start at the end of the range, such as 4000, and work your way up the scale in units of 100:
This is another not-specific-to-Microsoft piece of advice: Do not modify the default rules in your NSG. Think of these as a way to get back to the factory defaults. I find that it’s always best to create a new rule that overrides the low-priority default rules.
Denying all traffic to the Internet is a sledge hammer that can secure your network from data leakage or malware, but it can break things. A downside is that Azure virtual machines can intermittently require access to Azure IP addresses (which fall under the Internet tag) for essential services. If this traffic is blocked, then bad things can happen. Microsoft’s Keith Mayer has documented a solution for this.
You’ll probably find yourself in a situation where you need to troubleshoot how your NSG is impacting (or not) your traffic. If so, you should enable diagnostics. To do this, open the settings of the NSG, click Diagnostics, and set the status to On and select a storage account to save the logs to. You can then inspect the logs later if there is an error or you manage to recreate one.
Do not apply NSGs to your gateway subnet. There is simply no need to do this, doing it will break things, and it is unsupported.