AWS Security Blog

High-Availability IAM Design Patterns

Today Will Kruse, Senior Security Engineer on the AWS Identity and Access Management (IAM) team, provides a tutorial on how to enable resiliency against authentication and authorization failures in an application deployed on Amazon EC2 using a high availability design pattern based on IAM roles.


Background

Many of you invest significant effort to ensure that a single change does not impact your entire application. You plan for single machine outages, data center outages, network hiccups, and more by deploying high-availability architectures to address these risks. Your use of IAM should be no exception: administrators can make mistakes, such as incorrectly modifying an IAM policy or inadvertently disabling an access key. If you design AWS applications, you should apply the same high availability concepts to your use of IAM so that the impact of these events can be minimized or avoided completely.

In general, you can partition the authentication elements (e.g., an IAM user or role) and authorization elements (e.g., IAM policies) for a single application into multiple sets. Therefore if a change is made to the authentication or authorization configuration for a single set, any negative impact would be confined to that set rather than affecting the entire application.

For example, you might take the application instances running behind a load balancer and designate a small set that you can use for production validation as you can see in Figure 1 below. That way, in order to make sure a change you’ve tested in pre-production doesn’t affect the whole cluster, you can deploy a new IAM policy to just this small set. After you’ve validated that the change hasn’t introduced any regressions, you can deploy the change to the remaining instances with confidence.

Diagram illustrating application instances running behind a load balancer, with a small set of instances that you can use for production validation

Next, let’s walk through a tutorial that demonstrates how to apply this to your applications.

Isolating Failures to an Availability Zone

In the tutorial below, you’ll use an application deployed on EC2 that is also using an IAM role (which we’ve discussed in a previous blog post). Since you’re deploying on EC2, you can take advantage of the logical grouping provided by EC2 Availability Zones and create isolated IAM authentication and authorization information for each availability zone in which your application runs.

When you are evaluating your high availability needs, don’t lose sight of the “keep it simple” principle. For example, if you have a highly available application and you’re willing to tolerate a loss of 50% of your capacity, you’ll only need two groups, no more.

Prerequisites

Before you get started, here are the prerequisites:

  • An application running in EC2 that is deployed to several availability zones. The availability zones we’ll use in the example are us-east-1c and us-east-1d.  The example app is called “XYZApp” (clever, I know!) and uses Amazon DynamoDB (which is resilient against failures in an availability zone).
  • A role and an EC2 instance profile for each availability zone. I’ll walk you through creating these.
  • An access policy that defines what our application is allowed to do.

When set up, the application will look like the following illustration:

Diagram illustrating the application after it has been set up

You’ll need to use an IAM user who has the permission to create roles, configure roles, and start EC2 instances.  Finally, you’ll also need to have a properly-configured installation of the AWS command line interface (CLI).

Now that we’ve discussed the prerequisites, let’s walk through the first step.

Create the Role’s Trust Policy

Roles require two policies: a trust policy that defines who can assume the role and an access policy that defines what the assumer can do. For more background on roles and role policies, see the AWS Identity and Access Management documentation.

I will be using the following trust policy, saved as role_trust_policy.json in your current working directory:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

If you’ve created a role for EC2 before, you’ll recognize this as the standard trust role policy for IAM Roles for EC2.  It says thatec2.amazonaws.com (which represents the EC2 service) is allowed to call the Security Token Service (STS) to get temporary security credentials. Our application, running on EC2 instances, will use these credentials to call other AWS services.

Create and Configure a Role for Each Availability Zone

The way you are going to achieve availability zone isolation for authentication (authorization will be addressed later) is to create a separate role for each availability zone.

You create and configure the role as follows, using the AWS CLI’s create-role command:
aws iam create-role --role-name XYZApp-us-east-1c --assume-role-policy-document file://role_trust_policy.json

As you can see, you’ve given the role a name that combines the application name (XYZApp) and the intended availability zone (us-east-1c).  You’ve also attached the aforementioned trust policy so that EC2 can assume the role.  You’ll do something very similar for the second availability zone:

aws iam create-role --role-name XYZApp-us-east-1d --assume-role-policy-document file://role_trust_policy.json

A successful call to create-role returns a response like the following:

{
    "Role": {
        "AssumeRolePolicyDocument": {
            "Version": "2008-10-17",
            "Statement": [
                {
                    "Action": "sts:AssumeRole",
                    "Principal": {
                        "Service": "ec2.amazonaws.com"
                    },
                    "Effect": "Allow",
                    "Sid": ""
                }
            ]
        },
        "RoleId": "AROWWWWWWWWWWWWWWWWWW",
        "CreateDate": "2013-10-11T06:52:44.057Z",
        "RoleName": "XYZApp-us-east-1d",
        "Path": "/",
        "Arn": "arn:aws:iam::123456789012:role/XYZApp-us-east-1d"
    }
}

In this response, you can see the role name you chose, the trust policy you used, and the generated ARN for the role.  The ARN is useful if you launch EC2 instances using the CLI or APIs (as opposed to the console).  In the next step we’ll authorize access by adding an access policy.

Allow the Roles to Access Resources

Right now you have two roles but they can’t do anything until you attach an access policy to them.  The following policy assigns least-privilege permissions to the role so that whoever assumes the role is allowed to access only DynamoDB.  In our example, it is saved as role_access_policy.json in your current working directory:

{
  "Statement": [
    {
      "Action": [
        "dynamodb:BatchGetItem",
        "dynamodb:BatchWriteItem",
        "dynamodb:DeleteItem",
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query",
        "dynamodb:Scan",
        "dynamodb:UpdateItem"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/xyz"
    }
  ]
}

(Make sure that you substitute your AWS account number for 123456789012.)

You assign permissions to the two roles using the put-role-policy AWS CLI commands:

aws iam put-role-policy --role-name XYZApp-us-east-1c --policy-name XYZAppPolicy --policy-document file://role_access_policy.json

and:

aws iam put-role-policy --role-name XYZApp-us-east-1d --policy-name XYZAppPolicy --policy-document file://role_access_policy.json

You can see that you attached the same access policy to the role for each availability zone and gave it a name that describes its purpose.  In doing so, you’ve achieved availability zone isolation for authorization because a separate copy of this policy document is attached to each role; modifying one policy does not affect the other.

Enable the Roles for Use with EC2

To allow the roles to be used with EC2, you need to create and configure instance profiles.  Create an instance profile for each role (and therefore availability zone) as follows:

aws iam create-instance-profile --instance-profile-name XYZApp-us-east-1c

and:

aws iam create-instance-profile --instance-profile-name XYZApp-us-east-1d

A successful call to create-instance-profile returns a response like the following:

{
    "InstanceProfile": {
        "InstanceProfileId": "AIXXXXXXXXXXXXXXXXXXX",
        "Roles": [],
        "CreateDate": "2013-10-10T22:06:31.993Z",
        "InstanceProfileName": "XYZApp-us-east-1d",
        "Path": "/",
        "Arn": "arn:aws:iam::123456789012:instance-profile/XYZApp-us-east-1d"
    }
}

Next you need to add the roles to the instance profile as follows:

aws iam add-role-to-instance-profile --instance-profile-name XYZApp-us-east-1c --role-name XYZApp-us-east-1c

and:

aws iam add-role-to-instance-profile --instance-profile-name XYZApp-us-east-1d --role-name XYZApp-us-east-1d

You can inspect the instance profiles using list-instance-profiles:

aws iam list-instance-profiles | grep RoleName

(Note that the command line above is Unix/Linux-specific as it uses a pipe and the grep command.)

The results look like the following:

"RoleName": "XYZApp-us-east-1c",
"RoleName": "XYZApp-us-east-1d",

Now that our configuration is complete, we can launch some EC2 instances.

Start Your EC2 Instances!

Now you just follow the standard steps to start EC2 instances with IAM roles, using the appropriate role for each availability zone, using a command like the following one (substitute appropriate image IDs and instance types):

aws ec2 run-instances --key-name work --image-id ami-83e4bcea --count 1 --instance-type t1.micro --security-groups default --placement AvailabilityZone=us-east-1c --iam-instance-profile Name=XYZApp-us-east-1c --region us-east-1

and:

aws ec2 run-instances --key-name work --image-id ami-83e4bcea --count 1 --instance-type t1.micro --security-groups default --placement AvailabilityZone=us-east-1d --iam-instance-profile Name=XYZApp-us-east-1d --region us-east-1

Results

Having a role for each availability zone makes your application resilient in the face of unexpected authentication and authorization configuration changes.  If someone accidentally modifies or deletes the role, the role’s trust policy, the role’s access policy, the instance profiles, or any of the linkages between these pieces, the application failure is confined to that availability zone.  At that point, all the (other) fault tolerance you’ve built into your application seamlessly shifts your customers’ traffic to the unaffected availability zone.  The final product looks like the following figure:

Diagram illustrating the final product

Remember, you don’t have to segment your application’s IAM resources by availability zone.  Another partitioning scheme may make more sense, especially if your application isn’t running on EC2. For example, if you have two data centers, one on the west coast of the U.S. and one on the east coast, you may want to have your application running in your west coast data center use a dedicated IAM user (and associated policies). Similarly, the same application deployed in your east coast data center (for geographic failure isolation, perhaps) would use a separate, dedicated IAM user (and associated policies) as described above.

No matter which partitioning scheme you choose, keep these key deployment and configuration management best practices in mind:

  • Minimize the impact of a change by starting small and deploying to more systems as you gain confidence in your change. For example, you might deploy first to a few instances, then to the availability zone, and then to the remainder of the application.
  • Actively test the changes after each deployment, both for the presence of the expected change and the absence of regressions.  And, of course, have a roll-back plan.
  • Promote the exact configuration that you tested to the next, larger set of instances.

Put these tools to work for you by segmenting your application’s IAM resources into at least two groups, based on your anticipated failure scenarios and your capacity requirements during those scenarios.  You’ll be able to deploy with more confidence and you’ll sleep better at night knowing that your application is up and healthy, even if an unexpected change is made to its IAM configuration.

If you have any questions or need assistance in setting up isolation, please visit us at the IAM forum.

– Ben