Using streams when getting objects from S3
1. The classic problem
Say we are facing the classic problem: We have a Lambda function, which programatically receives objects from S3 with the AWS SDK in Node.js.
The application uses the getObject
method to receive the object from the bucket.
2. Changes
But when we upgrade to version 3 of the SDK (or write a new application with that version), we will experience some changes in the method signature.
Version 3 is modular, so we only have to install what we need in the application. It will reduce the package size, which improves deployment time, so everything sounds good.
We should only install the @aws-sdk/client-s3
module instead of the whole aws-sdk
package. The module contains the getObject
method that helps us receive the objects from the bucket.
The S3
constructor is still available in the module, so it’s nothing new up to this point.
2.1. No promise() method
The first change is that the getObject
method will return a Promise
.
In version 2, the getObject
method returns an object, and we had to call the promise()
method, which resolves to the S3 response. Because we always want to use the async/await
syntax instead of callbacks, the promise()
method has been part of our development life.
The good news is that AWS has simplified the signature in version 3, and the getObject
method already returns a Promise
. Therefore we don’t have to call the promise()
method if we want to await
to get the resolved value.
2.2 Readable streams instead of Buffer
The promise the S3 getObject
method resolves to an object, which extends the GetObjectOutput
type. This object has the same properties as in SDK v2 but contains a breaking change.
In version 3 the Body
property of the resolved S3 response object is a readable stream instead of Buffer
. The modification implies that we should change how the application handles the object.
3. Some TypeScript code
Readable streams implement the Symbol.asyncIterator method, so the streams are also async iterables.
So we can use the for...of
construct to iterate over the readable stream and get the chunks the stream provides.
In the following example, we will return the object we have downloaded from S3. The code example that handles the getObject
requests can look like this:
async function getObject(params) {
const s3ResponseStream = (await s3.getObject(params)).Body
const chunks = []
for await (const chunk of s3ResponseStream) {
chunks.push(chunk)
}
const responseBuffer = Buffer.concat(chunks)
return JSON.parse(responseBuffer.toString())
}
Each chunk
is a Buffer
. After we have received the last chunk of the S3 object, we can concatenate and convert them to a string, then finally to a JavaScript object.
The Lambda handler can look like this:
import { S3 } from '@aws-sdk/client-s3'
const s3 = new S3({ region: 'us-east-1' })
export async function handler(event) {
try {
const s3Object = await getObject({
Bucket: 'ARN OF THE BUCKET',
Key: 'NAME OF THE OBJECT TO FETCH',
})
return s3Object
} catch (error) {
console.error('Error while downloading object from S3', error.message)
throw error
}
}
We can wrap the stream handling logic to a function called getObject
, and use it in a try/catch
block as we usually do in the Lambda handler.
Please note that we still store the S3 object in memory in the above example. The real benefit of streams is that we process the chunk as they arrive. These use cases like transforming the data, saving it to a database, or returning the response as a stream are not part of this post, and I might cover them another time.
4. Summary
The getObject
method’s signature has changed in SDK version 3. The Body
property of the response is now a readable stream instead of Buffer
.
We can use the core Node.js stream logic to handle the return value in our Lambda functions.