AWS Developer Tools Blog

Announcing the Amazon S3 Managed Uploader in the AWS SDK for JavaScript

Today’s release of the AWS SDK for JavaScript (v2.1.0) contains support for a new uploading abstraction in the AWS.S3 service that allows large buffers, blobs, or streams to be uploaded more easily and efficiently, both in Node.js and in the browser. We’re excited to share some details on this new feature in this post.

The new AWS.S3.upload() function intelligently detects when a buffer or stream can be split up into multiple parts and sent to S3 as a multipart upload. This provides a number of benefits:

  1. It enables much more robust upload operations, because if uploading of a single part fails, that individual part can be retried separately without requiring the whole payload to be resent.
  2. Multiple parts can be queued and sent in parallel, allowing for much faster uploads when enough bandwidth is available.
  3. Most importantly, due to the way the managed uploader buffers data in memory using multiple parts, this abstraction does not need to know the full size of the stream, even though Amazon S3 typically requires a content length when uploading data. This enables many common Node.js streaming workflows (like piping a file through a compression stream) to be used natively with the SDK.

A Node.js Example

Let’s see how you can leverage the managed upload abstraction by uploading a stream of unknown size. In this case, we will be piping contents from a file (bigfile) from disk through a gzip compression stream, effectively sending compressed bytes to S3. This can be done easily in the SDK using upload():

// Load the stream
var fs = require('fs'), zlib = require('zlib');
var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());

// Upload the stream
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body}, function(err, data) {
  if (err) console.log("An error occurred", err);
  console.log("Uploaded the file at", data.Location);
})

A Browser Example

Important Note: In order to support large file uploads in the browser, you must ensure that your CORS configuration exposes the ETag header; otherwise, your multipart uploads will not succeed. See the guide for more information on how to expose this header.

All of this works in the browser too! The only difference here is that, in the browser, we will probably be dealing with File objects instead:

// Get our File object
var file = $('#file-chooser')[0].files[0];

// Upload the File
var bucket = new AWS.S3({params: {Bucket: 'myBucket'});
var params = {Key: file.name, ContentType: file.type, Body: file};
bucket.upload(params, function (err, data) {
  $('#results').html(err ? 'ERROR!' : 'UPLOADED.');
});

Tracking Total Progress

In addition to simply uploading files, the managed uploader can also keep track of total progress across all parts. This is done by listening to the httpUploadProgress event, similar to the way you would do it with normal request objects:

var fs = require('fs');
var zlib = require('zlib');

var body = fs.createReadStream('bigfile').pipe(zlib.createGzip());
var s3obj = new AWS.S3({params: {Bucket: 'myBucket', Key: 'myKey'}});
s3obj.upload({Body: body}).
  on('httpUploadProgress', function(evt) {
    console.log('Progress:', evt.loaded, '/', evt.total); 
  }).
  send(function(err, data) { console.log(err, data) });

Note that evt.total might be undefined for streams of unknown size until the entire stream has been chunked and the last parts are being queued.

Configuring Concurrency and Part Size

Since the uploader provides concurrency and part size management, these values can also be configured to tune performance. In order to do this, you can provide an options map containing the queueSize and partSize to control these respective features. For example, if you wanted to buffer 10 megabyte chunks and reduce concurrency down to 2, you could specify it as follows:

var opts = {queueSize: 2, partSize: 1024 * 1024 * 10};
s3obj.upload({Body: body}, opts).send(callback);

You can read more about controlling these values in the API documentation.

Handling Failures

Finally, you can also control how parts are cleaned up in the managed uploader when a failure occurs using the leavePartsOnError option. By default, the managed uploader will attempt to abort a multipart upload if any individual part fails to upload, but if you would prefer to handle this failure manually (i.e., by attempting to recover more aggressively), you can set leavePartsOnError to true:

s3obj.upload({Body: body}, {leavePartsOnError: true}).send(callback);

This option is also documented in the API documentation.

Give It a Try!

We would love to see what you think about this new feature, so give the new AWS SDK for JavaScript v2.1.0 a try and provide your feedback in the comments below or on GitHub!