How to Add DNS Filtering to Your NAT Instance with Squid

Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources on a virtual private network that you've defined. On an Amazon VPC, network address translation (NAT) instances, and more recently NAT gateways, are commonly used to enable instances in a private subnet to initiate outbound traffic to the Internet, but prevent the instances from receiving inbound traffic initiated by someone on the Internet.

For security and compliance purposes, you might have to filter the requests initiated by these instances. Using iptables rules, you could restrict outbound traffic with your NAT instance based on a predefined destination port or IP address. However, you may need to enforce more complex policies, such as allowing requests to AWS endpoints only, which cannot be achieved easily by using iptables rules.

In this post, I discuss and give an example of how Squid, a leading open-source proxy, can restrict both HTTP and HTTPS outbound traffic to a given set of Internet domains, while being fully transparent for instances in the private subnet. First, I explain briefly how to create the infrastructure resources required for this approach. Then, I provide step-by-step instructions to install, configure, and test Squid as a transparent proxy.

Keep in mind, that a possible alternative solution could be to deploy a proxy (also known as a forward proxy) in your Amazon VPC. However, a major drawback is that the proxy must be explicitly configured on every instance in the private subnet. This could cause connectivity issues that would be difficult to troubleshoot, if not impossible to remediate, when the application does not support proxy usage.

Deploying the example

The following steps are for manually creating and configuring the required resources. Alternatively, you could use AWS CloudFormation to automate this procedure. Click Create a Stack to open the CloudFormation console and create an AWS CloudFormation stack from the template I developed. Follow the on-screen instructions and go directly to the “Testing the deployment” section later in this blog post when the stack creation has completed (it can take up to 20 minutes).

Note: If you need to allow requests to S3 buckets in the same region as your VPC, you could use a VPC endpoint for Amazon S3 instead (see VPC Endpoints). It enables you to create a private connection between your VPC and another AWS service without requiring access over the Internet. It also has a policy that controls the use of the endpoint to access Amazon S3 resources.

Set up a VPC and create 2 Amazon EC2 instances

The following steps take you through the manual creation and configuration of a VPC and two EC2 instances: one in a public subnet for deploying Squid, and another in a private subnet for testing the configuration. See the NAT Instances documentation for further details, because the prerequisites are similar.

To create and configure a VPC and 2 EC2 instances:

  1. Create a VPC (see Creating a VPC, if you need help with this step).
  2. Create two subnets (see Creating a Subnet): one called “Public Subnet” and another called “Private Subnet.”
  3. Create and attach an Internet Gateway to the VPC (see Attaching an Internet Gateway).
  4. Add a rule to the default route table that sends traffic destined outside the VPC (0.0.0.0/0) to the Internet Gateway (see Adding and Removing Routes from a Route Table).
  5. Add a rule to the VPC default security group that allows ingress SSH traffic (TCP 22) from your 0.0.0.0/0 (see Adding Rules to a Security Group).
  6. Launch an Amazon Linux t2.micro instance called “Squid Instance” in the Public Subnet (make sure to use the Amazon Linux AMI, not the NAT instance AMI). Enable Auto-assign Public IP, choose the VPC default security group as the security group to attach, and select a valid key pair. Leave all other parameters as default (see Launching an Instance).
  7. After the instance starts running, disable the source/destination check (see Disable Source/Destination Checks).
  8. Create a new route table in the VPC. Add a rule that sends traffic destined outside the VPC (0.0.0.0/0) to the Squid instance and associate this new route table with the Private Subnet.
  9. Create an instance role called “Testing Instance Role” and attach the managed policy AmazonEC2ReadOnlyAccess (see detailed instructions).
  10. Finally, launch another Amazon Linux t2.micro instance called “Testing Instance” in the Private Subnet (make sure to use the Amazon Linux AMI, not the NAT instance AMI). Select Testing Instance Role as the instance role, attach the VPC default security group, and select the same key pair. Leave all other parameters as default.

The following diagram illustrates how the components in this process interact with each other. Squid Instance intercepts HTTP/S requests sent by Testing Instance. Squid Instance then initiates a connection with the destination host on behalf of Testing Instance, which goes through the Internet gateway.

Installing Squid

Squid intercepts the requested domain before applying the filtering policy:

  • For HTTP, Squid retrieves the Host header field included in all HTTP/1.1 request messages, which specifies the Internet host being requested.
  • For HTTPS, the HTTP traffic is encapsulated in a TLS connection between the instance in the private subnet and the remote host. Squid cannot retrieve the Host header field because the header is encrypted. A feature called SslBump would allow Squid to decrypt the traffic, but this would not be transparent for the client because the certificate would be considered invalid in most cases. The feature we use instead, called SslPeekAndSplice, retrieves the Server Name Indication (SNI) from the TLS initiation, which contains the requested Internet host. As a result, Squid can make filtering decisions without unencrypting the HTTPS traffic.

Note: Some older client-side software, stacks do not support SNI. These are the minimum versions of some important stacks and programming languages that support SNI: Python 2.7.9 and 3.2, Java 7 JSSE, wget 1.14, OpenSSL 0.9.8j, cURL 7.18.1

The feature SslPeekAndSplice was introduced in Squid 3.5. However, when this post was written, the Amazon Linux repository included Squid 3.1.10 (you can check whether Squid 3.5 is available using the command, yum info squid). For the purpose of this post, I will compile and install Squid 3.5 from the official source code.

Note: This Squid installation includes the minimum features required for this example and is not intended for production purposes. You may prefer to adapt it to your own needs and install Squid from your own Red Hat Package Manager (RPM) package or from unofficial RPM packages for CentOS 6.

Squid installation instructions

To manually compile, install, and configure Squid on the Squid instance:

  1. Connect to your Squid instance using SSH with the user ec2-user.
  2. Install the prerequisite packages.
sudo yum update -y
sudo yum install -y perl gcc autoconf automake make sudo wget gcc-c++    libxml2-devel libcap-devel libtool libtool-ltdl-devel openssl openssl-devel
  1. Go to this Squid page and retrieve the link to the tar.gz source code archive for the latest release (3.5.13 was the last release when this post was written). Use this link to download and extract the archive on the Squid instance.
SQUID_ARCHIVE=http://www.squid-cache.org/Versions/v3/3.5/squid-3.5.13.tar.gz
cd /tmp
wget $SQUID_ARCHIVE
tar xvf squid*.tar.gz
cd $(basename squid*.tar.gz .tar.gz)
  1. Compile and install Squid with the minimum required options. This may take up to 15 minutes.
sudo ./configure --prefix=/usr --exec-prefix=/usr --libexecdir=/usr/lib64/squid --sysconfdir=/etc/squid --sharedstatedir=/var/lib --localstatedir=/var --libdir=/usr/lib64 --datadir=/usr/share/squid --with-logdir=/var/log/squid --with-pidfile=/var/run/squid.pid --with-default-user=squid --disable-dependency-tracking --enable-linux-netfilter --with-openssl --without-nettle

sudo make
sudo make install
  1. Complete the Squid installation.
sudo adduser -M squid
sudo chown -R squid:squid /var/log/squid /var/cache/squid
sudo chmod 750 /var/log/squid /var/cache/squid
sudo touch /etc/squid/squid.conf
sudo chown -R root:squid /etc/squid/squid.conf
sudo chmod 640 /etc/squid/squid.conf
cat | sudo tee /etc/init.d/squid <<'EOF'
#!/bin/sh
# chkconfig: - 90 25
echo -n 'Squid service'
case "$1" in
start)
/usr/sbin/squid
;;
stop)
/usr/sbin/squid -k shutdown
;;
reload)
/usr/sbin/squid -k reconfigure
;;
*)
echo "Usage: `basename $0` {start|stop|reload}"
;;
esac
EOF
sudo chmod +x /etc/init.d/squid
sudo chkconfig squid on

Note: If you have installed Squid from a RPM package, you are not required to follow the previous instructions for installing Squid before proceeding to the next steps, because your Squid instance already has the required configuration.

Configuring and starting Squid

The SslPeekAndSplice feature is implemented in the same Squid module as SslBump. To enable this module, Squid requires that we provide a certificate, though it will not be used to decode HTTPS traffic. I create a certificate using OpenSSL.

sudo mkdir /etc/squid/ssl
cd /etc/squid/ssl
sudo openssl genrsa -out squid.key 2048
sudo openssl req -new -key squid.key -out squid.csr -subj "/C=XX/ST=XX/L=squid/O=squid/CN=squid"
sudo openssl x509 -req -days 3650 -in squid.csr -signkey squid.key -out squid.crt
sudo cat squid.key squid.crt | sudo tee squid.pem

Next, configure Squid to allow requests to *.amazonaws.com, which corresponds to AWS endpoints. Note that you can restrict access to a defined set of AWS services only. See AWS Regions and Endpoints for a detailed list of endpoints.

For HTTPS traffic, note the ssl_bump directives instructing Squid to “peek” (retrieve the SNI) and then “splice” (become a TCP tunnel without decoding) or “terminate” the connection depending on the requested host.

cat | sudo tee /etc/squid/squid.conf <<EOF
visible_hostname squid

#Handling HTTP requests
http_port 3129 intercept
acl allowed_http_sites dstdomain .amazonaws.com
#acl allowed_http_sites dstdomain [you can add other domains to permit]
http_access allow allowed_http_sites

#Handling HTTPS requests
https_port 3130 cert=/etc/squid/ssl/squid.pem ssl-bump intercept
acl SSL_port port 443
http_access allow SSL_port
acl allowed_https_sites ssl::server_name .amazonaws.com
#acl allowed_https_sites ssl::server_name [you can add other domains to permit]
acl step1 at_step SslBump1
acl step2 at_step SslBump2
acl step3 at_step SslBump3
ssl_bump peek step1 all
ssl_bump peek step2 allowed_https_sites
ssl_bump splice step3 allowed_https_sites
ssl_bump terminate step2 all

http_access deny all
EOF

You may have noticed that Squid listens on port 3129 for HTTP traffic and 3130 for HTTPS. Because Squid cannot directly listen to 80 and 443, we have to redirect the incoming requests from instances in the private subnet to the Squid ports using iptables. You do not have to enable IP Forwarding or add any FORWARD rule, as you would do with a standard NAT instance.

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 3129
sudo iptables -t nat -A PREROUTING -p tcp --dport 443 -j REDIRECT --to-port 3130
sudo service iptables save

You can now start Squid.

sudo service squid start

Testing the deployment

The Testing Instance that was launched earlier can be used to test the configuration. Because this instance is not accessible from the Internet, you have to jump onto the Squid instance to log on to Testing Instance.

Because both instances were launched with the same key pair, you can connect to Squid Instance using the -A argument in order to forward the SSH key to Testing Instance.

ssh-add [key]
ssh -A ec2-user@[public IP of the Squid instance] –i [key]
ssh ec2-user@[private IP of the client instance] –i [key]

You can test the transparent proxy instance with the following commands. Only the last three requests should return a valid response, because Squid allows traffic to *.amazonaws.com only.

curl http://www.amazon.com
curl https://www.amazon.com
curl http://calculator.s3.amazonaws.com/index.html
curl https://calculator.s3.amazonaws.com/index.html
aws ec2 describe-regions --region us-east-1

You can now clean up the resources you just created.

Summary

In this blog post, I have shown how you can use Squid to filter outgoing traffic to the Internet and help meet your security and compliance needs, while being fully transparent for the back-end instances in your VPC.

I invite you to adapt this example to your own requirements. For example, you may implement a high-availability solution, similar to the solution described in High Availability for Amazon VPC NAT Instances: An Example; centralize Squid metrics and access logs, similar to the solution described in Using Squid Proxy Instances for Web Service Access in Amazon VPC: Another Example with AWS CodeDeploy and Amazon CloudWatch; or leverage other Squid features (logging, caching, etc.) for further visibility and control over outbound traffic.

If you have any questions or suggestions, please leave a comment below or on the IAM forum.

- Nicolas

Comments