AWS Developer Tools Blog

Leveraging the s3 and s3api Commands

Have you ever run aws help on the command line or browsed the AWS CLI Reference Documentation and noticed that there are two sets of Amazon S3 commands to choose from: s3 and s3api? If you are completely unfamiliar with either the s3 or s3api commands, you can read about the two commands in the AWS CLI User Guide. In this post, I am going to go into detail about the two different commands and provide a few examples on how to leverage the two sets of commands to your advantage.

s3api

Most of the commands in the AWS CLI are generated from JSON models, which directly model the APIs of the various AWS services. This allows the CLI to generate commands that are a near one-to-one mapping of the service’s API. The s3api commands falls into this category of commands. The commands are entirely driven by these JSON models and closely mirrors the API of S3, hence the name s3api. It mirrors the API such that each command operation, e.g. s3api list-objects or s3api make-bucket, shares a similar operation name, a similar input, and a similar output as the corresponding operation in S3’s API. As a result, this gives you a significantly granular amount of control over the requests you make to S3 using the CLI.

s3

The s3 commands are a custom set of commands specifically designed to make it even easier for you to manage your S3 files using the CLI. The main difference between the s3 and s3api commands is that the s3 commands are not solely driven by the JSON models. Rather, the s3 commands are built on top of the operations found in the s3api commands. As a result, these commands allow for higher-level features that are not provided by the s3api commands. This includes, but is not limited to, the ability to synchronize local directories and S3 buckets, transfer multiple files in parallel, stream files, and automatically handle multipart transfers. In short, these commands further simplify and further quicken the transferring of files to, within, and from S3.

s3 and s3api Examples

Both sets of S3 commands have a lot to offer. With this wide array of commands to choose from, it is important to be able to identify what commands you need for your specific use case. For example, if you want to upload a set of files on your local machine to your S3 bucket, you would probably want to use the s3 commands via the cp or sync command operations. On the other hand, if you wanted to set a bucket policy, you would use the s3api commands via the put-bucket-policy command operation.

However, your choice of S3 commands should not be limited to strictly deciding whether you need to use the s3 commands or s3api commands. Sometimes you can use both sets of commands in conjunction to satisfy your use case. Often times this proves to be even more powerful as you are able to the leverage the low-level granular control of the s3api commands with the higher-level simplicity and speed of the s3 commands. Here are a few examples of how you can work with both sets of S3 commands for your specific use case.

Bucket Regions

When you create an S3 bucket, the bucket is created in a specific region. Knowing the region that your bucket is in is essential for a variety of use cases such as transferring files across buckets located in different regions and making requests that require Signature Version 4 signing. However, you may not know or remember where your bucket is located. Fortunately by using the s3api commands, you can determine your bucket’s region.

For example, if I make a bucket located in the Frankfurt region using the s3 commands:

$ aws s3 mb s3://myeucentral1bucket --region eu-central-1
make_bucket: s3://myeucentral1bucket/

I can then use s3api get-bucket-location to determine the region of my newly created bucket:

$ aws s3api get-bucket-location --bucket myeucentral1bucket
{
    "LocationConstraint": "eu-central-1"
}

As shown above, the value of the LocationConstraint member in the output JSON is the expected region of the bucket, eu-central-1. Note that for buckets created in the US Standard region, us-east-1, the value of LocationConstraint will be null. As a quick reference to how location constraints correspond to regions, refer to the AWS Regions and Endpoints Guide.

Once you have learned the region of your bucket, you can pass the region in using the --region parameter, setting it in your config file, setting it in a profile, or setting it using the AWS_DEFAULT_REGION environment variable. You can read more about how to set a region in the AWS CLI User Guide This allows you to select your region when you are making subsequent requests to your bucket via the s3 and s3api commands.

Deleting a Set of Buckets

For this example, suppose that I have a lot of buckets that I was using for testing and they are no longer needed. But, I have other buckets, too, and they need to stick around:

$ aws s3 ls
2014-12-02 13:36:17 awsclitest-123
2014-12-02 13:36:24 awsclitest-234
2014-12-02 13:36:51 awsclitest-345
2014-11-21 16:47:14 mybucketfoo

The buckets beginning with awsclitest- are test buckets that I want to get rid of. An obvious way would to be to just delete each bucket using aws s3 rb one at a time. This becomes tedious though if I were to have a lot of these test buckets or the test bucket names were longer and more complicated. I am going to go step by step on how you can build a single command that will delete all of the buckets that begin with awsclitest-.

Instead of using the s3 ls command to list my buckets, I am going to use the s3api list-buckets command to list them:

$ aws s3api list-buckets
{
    "Owner": {
        "DisplayName": "mydisplayname",
        "ID": "myid"
    },
    "Buckets": [
        {
            "CreationDate": "2014-12-02T21:36:17.000Z",
            "Name": "awsclitest-123"
        },
        {
            "CreationDate": "2014-12-02T21:36:24.000Z",
            "Name": "awsclitest-234"
        },
        {
            "CreationDate": "2014-12-02T21:36:51.000Z",
            "Name": "awsclitest-345"
        },
        {
            "CreationDate": "2014-11-22T00:47:14.000Z",
            "Name": "mybucketfoo"
        }
    ]
}

At first glance, it does not make much sense to use the s3api list-buckets over the s3 ls because all of the bucket names are embedded in the JSON output of the command. However, we can take advantage of the command’s --query parameter to perform JMESPath queries for specific members and values in the JSON output:

$ aws s3api list-buckets 
   --query 'Buckets[?starts_with(Name, `awsclitest-`) == `true`].Name'
[
    "awsclitest-123",
    "awsclitest-234",
    "awsclitest-345"
]

If you are unfamiliar with the --query parameter, you can read about it in the AWS CLI User Guide. For this specific query, I am asking for the names of all of the buckets that begin with awsclitest-. However, the output is still a little difficult to parse if we hope to use that as input to the s3 rb command. To make the names easier to parse out, we can modify our query slightly and specify text for the --output parameter:

$ aws s3api list-buckets 
   --query 'Buckets[?starts_with(Name, `awsclitest-`) == `true`].[Name]' 
   --output text
awsclitest-123
awsclitest-234
awsclitest-345

With this output, we can now use it as input to perform a forced bucket delete on all of the buckets whose name starts with awsclitest-:

$ aws s3api list-buckets 
   --query 'Buckets[?starts_with(Name, `awsclitest-`) == `true`].[Name]' 
   --output text | xargs -I {} aws s3 rb s3://{} --force
delete: s3://awsclitest-123/test
remove_bucket: s3://awsclitest-123/
delete: s3://awsclitest-234/test
remove_bucket: s3://awsclitest-234/
delete: s3://awsclitest-345/test
remove_bucket: s3://awsclitest-345/

As shown in the output, all of the desired buckets along with any files inside of them were deleted. Then to ensure that it worked, I then can list out all of my buckets:

$ aws s3 ls
2014-11-21 16:47:14 mybucketfoo

Aggregating S3 Server Access Logs

In this final example, I will show you how you can use the s3 and s3api commands together in order to aggregate your S3 server access logs. These logs are used to track the requests for access to your S3 bucket. If you are unfamiliar with server access logs, you read can about them in the Amazon S3 Developer Guide.

Server access logs follow the naming convention TargetPrefixYYYY-mm-DD-HH-MM-SS-UniqueString where YYYY, mm, DD, HH, MM and SS are the digits of the year, month, day, hour, minute, and seconds, respectively, of when the log file was delivered. However, the numbers of logs delivered for a specific period of time and inside of a specific log file is somewhat unpredictable. As a result, it would be convenient to aggregate all of the logs for a specific period of time into one file in an S3 bucket.

For this example, I am going to aggregate all of the logs that were delivered on October 31, 2014 from 11 a.m. to 12 p.m. to the file 2014-10-31-11.log in my bucket. To begin, I will use s3api list-objects to list all of the objects in my bucket beginning with logs/2014-10-31-11:

$ aws s3api list-objects --bucket myclilogs --output text 
   --prefix logs/2014-10-31-11 --query Contents[].[Key]
logs/2014-10-31-11-19-03-D7E3D44429C236C9
logs/2014-10-31-11-19-05-9FCEDD1393C9319F
logs/2014-10-31-11-19-26-01DE8498F22E8EB6
logs/2014-10-31-11-20-03-1B26CD31AE5BFEEF
logs/2014-10-31-11-21-34-757D6904963C22A6
logs/2014-10-31-11-21-35-27B909408B88017B
logs/2014-10-31-11-21-50-1967E793B8865384

.......  Continuing to the end ...........

logs/2014-10-31-11-42-44-F8AD38626A24E288
logs/2014-10-31-11-43-47-160D794F4D713F24

Using both the --query and --ouput parameters, I was able to list the logs in a format that could easily be used as input for the s3 commands. Now that I have identified all of the logs that I want to aggregate, I am going to take advantage of s3 cp streaming capability to actually aggregate the logs.

When using s3 cp to stream, you have two options: upload a stream from standard input to an S3 object or download an S3 object as a stream to standard output. You can do so by specifying - as the first path parameter to the cp command if you want to upload a stream or by specifying - as the second path parameter to the cp if you want to download an object as a stream. For my use case, I am going to stream in both directions:

$ aws s3api list-objects --bucket myclilogs 
   --output text --prefix logs/2014-10-31-11 
   --query Contents[].[Key] | 
   xargs -I {} aws s3 cp s3://myclilogs/{} - | 
   aws s3 cp - s3://myclilogs/aggregatedlogs/2014-10-31-11.log

The workflow for this command is as follows. First, I stream each desired log one by one to standard output. Then I pipe the stream from standard output to standard input and upload the stream to the desired location in my bucket.

If you wanted to speed up this process, you can utilize GNU parallel shell tool to make each of the s3 cp commands, that download the log as a stream, run in parallel with each other:

$ aws s3api list-objects --bucket myclilogs 
   --output text --prefix logs/2014-10-31-11 
   --query Contents[].[Key] | 
   parallel -j5 aws s3 cp s3://myclilogs/{} - | 
   aws s3 cp - s3://myclilogs/aggregatedlogs/2014-10-31-11.log

By indicating the -j5 parameter in the command above, I am assigning each s3 cp streaming download command to one of five jobs that are running those commands in parallel. Also, note that the GNU parallel shell tool may not be automatically installed on your machine and can be installed with tools such as brew and apt-get.

Once the command finishes, I can then verify that my aggregated log exists:

$ aws s3 ls s3://myclilogs/aggregatedlogs/
2014-12-03 10:43:49     269956 2014-10-31-11.log

Conclusion

I hope that the description and examples that I provided will help you further leverage both the s3 and s3api commands to your advantage. However, do not limit yourself to just the examples I provided. Go ahead and try to figure out other ways to utilize the s3 and s3api commands together today!

You can follow us on Twitter @AWSCLI and let us know what you’d like to read about next! If you have any questions about the CLI or any feature requests, do not be afraid to get in communication with us via our GitHub repository

Stay tuned for our next blog post!