Following up from my last post (here) on what Network Access Control Lists (NACLs) and Security Groups (SGs) are, I will now take a look at where and how I think you should use them to ensure you have a secure network.
I'll use a basic scenario of a VPC (10.0.0.0/16) split into two public subnets, with access to the internet (10.0.0.0/24 and 10.0.1.0/24), and two private subnets, with no route the the internet (10.0.10.0/24 and 10.0.1.0/24). The application is running on 2 EC2 behind an application load balancer to discuss the options.
For a more detailed implementation take a look at my post on Creating a Well-Architected VPC which has multiple tiers and controls.
So lets start with taking a look at what security you would get if you just used the defaults that are provided when you create a VPC.
As you can see by default you get a single security group and NACL.
The NACL has an allow any port from anywhere rule, often refered to as an ANY-ANY rule, on both inbound and outbound connections. This means that all traffic is allowed and is the same as not having an NACL.
All subnets not explicitly associated with another NACL have the default NACL applied to them. One of the challenges with the default NACL is removing the initial rule when creating custom rules. While this is easy in the console it can be tricky in code and often leads to the rule not being removed. This is one reason I recommend to always create a custom NACL with needed rules and apply it to the required subnets. This way, as long as you create one custom rule, the default Allow ALL rule is never created.
Default Security Group
The Security Group is slightly more restrictive. It has an inbound rule that only allows traffic from other components with the same security group (see the source is the security group id) but the outbound allows all traffic to leave the instance.
Security Groups have to be selected when creating a resource so there has to be a conscious decision to use the default security group. However if selected at least only other AWS resources with the default security group can talk to the resource.
As with the default NACL removing the default rules is a little complicated via code so again, create a custom security group with the rules you need, even if a replica or the default security group.
Affect on Application
So if we just used the default NACL and Security group for our little application what would happen?
From a NACL perspective nothing would be blocked so all traffic could enter and exit the subnets and VPC as long as routing was in place. This would mean the resources would receive packets that are sent to them and the security group would decide what to do.
From a Security Group perspective the load balancer would be able to talk to the application and receive a response as they are both in the same security group. In addition the application instances would be able to talk to each other for the same reason. However, external users would not be able to communicate with the load balancer as it is blocking traffic not from resources in the same security group.
So what could we do with the defaults to make our application work but more secure?
So what might we want to do at the network layer?
Lets only change the inbound any-any rule to only allow https from anywhere.
This means that any traffic entering a subnet has to be on port 443 but it can be from any source IP address. By leaving the outbound rule as the default any-any we ensure all reply traffic can flow correctly.
So what can we do with a single security group?
First we need to allow https from anywhere so that the user can get to the load balancer. Then we want to remove the default rule so that only port 443 is allowed into the resources.
For now we can leave the default outbound rule. This means that the instance can get to remote resources with HTTPS for things such as application updates.
So while we can fix the default NACL and Security Group there is a better way.
What we should be doing is creating custom NACLs and Security Groups and not using the defaults. There are several reasons behind this.
Firstly, if you use the defaults there is no segregation/differentation between resources. All VPC Subnets have the have the same NACL and all resources the same Security Group. By creating new NACLs and Security Groups we can have different rules applied to different areas of the architecture.
For example, we can have a Public Subnet NACL that only allows the ports required from the internet (443 in, 1024-65535 out) and then only required ports to the private subnet (443 out, 1024-65535 in). We can then have a Private Subnet NACL with the required ports from the public subnet (443 in, 1024-65535 out).
The same for Security groups we can create a ALB Security group with inbound from internet on port 443 and outbound to a new Application Security Group. In the new Application Security Group, to be applied to the instance, we can just allow inbound on port 443 from the ALB Security Group.
Secondly, by creating new groups we make infrastructure as code (IaC) easier as we don't need complicated lookups of the default components. As we will create the resources we can easily refer to them in code. This means as we change our application over time we can update rules as needed.
So hopefully this gives you some idea of where and why to apply security features to your VPC. While something than many fine problematic, doing this improves your security and performance.
Why performance also? Well if you can reduce traffic hitting your resources by using NACLs, and blocking ports on the resource using Security Groups, you systems do not have to work as hard processing network traffic. Ideally you block as close to the source as possible using NACLs as this moves processing of rules from the resource to the network devices.
As always I'd love your thoughts on this post.
In the next post I'll take a look at how to troubleshoot when traffic is blocked by NACLs and Security Groups.