Uploading large files to S3 using streams

It often happens that large files need to be uploaded programmatically to an S3 bucket. This can happen in several ways. In this post I'll discuss some of these methods.

Let’s see a few variations for uploading files to an AWS S3 bucket.

1. Pre-requisites

I’ll use my admin credentials to run the code from my computer. The minimum permission you’ll need is PutObject, which is a WRITE permission to an S3 bucket.

I’ll create a new bucket for this exercise, but any existing bucket will do the job.

First, the AWS SDK needs to be installed to the project folder:

cd PROJECT_FOLDER
npm init -y # if the folder is not set up yet
npm install aws-sdk

2. Use the putObject method

This is probably one of the most often used S3 methods in the SDK.

It allows the user to upload files up to 5GB in one go, but AWS recommends using multipart upload (see below) for files from size greater than 5MB.

2.1. Upload files as strings

The file I’ll upload to S3 is called small-file.json and it doesn’t contain much - but it has everything that is important:

{
  "name": "James Bond"
}

For the sake of simplicity, I have saved this (and every other) data file to the same folder from where the code is run.

In a file called upload-file.js, the code can look like this:

import * as AWS from 'aws-sdk';

import smallFile from './small-file.json';

const s3 = new AWS.S3({
  apiVersion: '2006-03-01',
});

const params = {
  Bucket: 'YOUR_BUCKET_NAME',
  Key: 'small-file.json',
  Body: JSON.stringify(smallFile),
};

s3.putObject(params)
  .promise()
  .then(console.log)
  .catch(console.error);

It’s a good idea to lock the API version to ensure that methods will work in case of a change, although the latest version hasn’t changed for a while.

The params object contains the minimum required parameters. There are other ones, which can define encryption, storage class, tags and many more options, but for now let’s continue with the basics.

Bucket is the name of the bucket, Key is the name of the object in S3. Unfortunately, AWS doesn’t allow to upload objects in their original format, so in this case, I’ll stringify it.

The putObject method can be used in a Promise format. Promises are easy to work with and all we have to do is just call the promise method the putObject call returns.

We can now run the method from the terminal:

node upload-file.js

If everything goes well, the ETag will be written to the console and the S3 bucket will have the small-file.json object:

aws s3api list-objects-v2 --bucket YOUR_BUCKET_NAME

# response
{
  "Contents": [
    {
      "Key": "small-file.json",
      "LastModified": "2020-07-02T22:42:23.000Z",
      "ETag": "\"b4d20ab799ad6902344c8cf388f93e24\"",
      "Size": 21,
      "StorageClass": "STANDARD"
    }
  ]
}

2.2 Upload files using streams

Streams are an essential part of Node.js and putObject also supports streams.

Streams come handy when we need to work with large files.

Say that the file we want to upload to S3 using streams is called large-file.json.

In this case we’ll need to import the fs and path modules (both are native to Node.js):

import path from 'path';
import fs from 'fs';

const readStream = fs.createReadStream(path.join(__dirname, 'large-file.json'));

const paramsWithStream = {
  Bucket: 'arpadt-putobject-bucket',
  Key: 'large-file.json',
  Body: readStream,
};

s3.putObject(paramsWithStream)
  .promise()
  .then(console.log)
  .catch(console.error);

The key difference is that we create a readable stream using the createReadStream from the File System module. The method’s first, mandatory parameter is the path where the file is located. The join method from the Path module can create the required path. More on path.join() (and how it compares to path.resolve) can be found here. This readable stream will be the value for the Body property in the putObject input.

From here, putObject can be used in the same way as with the string body.

2.3. Tapping into the upload stream

The putObject request can be converted to a readable stream using the createReadStream method, which inherits from Node.js EventEmitter. This way we can tap into the upload and log when the upload has finished:

const uploadReadable = s3.putObject(paramsWithStream).createReadStream();

uploadReadable.on('end', () => {
  console.log('Upload has finished');
});

uploadReadable.on('error', (error) => {
  console.log('Error while uploading object: ', error);
});

3. Use the upload method for multipart upload

The upload method on the S3 class breaks up the large object into multiple parts and uploads them to S3 separately. After all parts have arrived in S3, it will put them together again to one object.

We can use streams for uploading the object to S3.

With the upload method, we can specify the number of concurrent queues and their sizes as the second argument. For example, we can have five concurrent queues with each having the size of 10MB:

const options = {
  partSize: 10 * 1024 * 1024,
  queueSize: 5,
};

s3.upload(params, options)
  .promise()
  .then(console.log)
  .catch(console.error);

This way we can upload large files quickly and in an efficient way without storing them in full size in memory.

4. Summary

putObject and upload are two ways of uploading files to S3. Both methods can be used to upload large files but it’s recommended by AWS to use multipart upload above a specific file size.

Both methods support the use of streams, which are useful when handling large files.

Thanks for reading and see you next time.