Safe error handling in Lambda functions
1. Ways of generating errors in Lambda
When a Lambda function is run, the process can result in an error in two different ways. Either we throw an error somewhere the function, and that piece of code gets reached (code-related errors), or an error occurs while the function gets invoked (not code-related errors).
The latter group can involve errors originating from the Lambda function invocation itself or some downstream service like a database.
This post will focus on the first case when we throw an error somewhere in the code.
I’m talking about this piece of code:
if (!entity.id) {
throw new Error('Id doesn\'t exist');
}
Throwing errors like this can cause problems.
2. Invocation types
The Lambda function invocation model determines how we can successfully handle errors.
2.1. Synchronous invocations
This is the simpler case.
Some services (API Gateway, for example) invoke Lambda synchronously. It means that Lambda runs the function, and then waits for the response. As soon as the response - either success or error - is available, Lambda will return it to the service.
When we have an API endpoint with a Lambda function handler, we most probably want to return these errors with a relevant status code. If the request doesn’t have the id
property, we won’t be able to do our logic, so we want to throw the error with a 400 Bad Request
status code.
The client can then try again with the necessary properties in the request.
2.2. Asynchronous invocations
Some services (S3 or SNS) invoke Lambda functions asynchronously, which is very common for event-based architecture.
In this case, the invoking service won’t wait for the response from Lambda. Lambda will place the invocation event in an internal queue. The service won’t get the response that the Lambda function generates by applying our logic to the input.
When the function throws an error, Lambda may try processing the event again, depending on the event notification settings. By default, Lambda tries to process it two more times.
2.3. Poll-based invocations
Lastly, when Lambda gets triggered by some services (SQS is a good example), it will poll the service for new records.
The default setting for the retry behaviour depends on the data expiration in the queue. Unless we override the settings, Lambda will try to process the message again.
3. The problem with handling errors when Lambda tries again
When we have automated workflows consisting of multiple stages where Lambda is invoked based on the asynchronous or poll-based model, additional invocations with the same payload can cause issues. When a lot of data suddenly flows into the pipeline, not thoroughly considered error handling can lead to a blocked pipeline or throttling invocations.
Think about it: If the id
property necessary for our logic doesn’t exist, will it exist when Lambda tries to process the object again? Since we have built a fully automated solution, most probably no, and the function will get invoked with the exact same payload. It will result in the same error, and the message will get sent back to the queue for another try, and so on until the retry settings cause this process to stop. These annoying bugs can make messages quickly accumulate in an SQS queue blocking the pipeline for healthy messages.
4. A solution
A solution to the problem above is careful error handling. We want to keep the retry settings because we would like Lambda to invoke the function again on some errors (throttling, 5xx
). This way, setting the number of retries to zero might not be the best option.
Let’s say that the function needs the id
property to complete the logic on the input.
The steps for custom error handling are the following.
4.1. Create a custom error object
We can create a custom error object, which functions as a flag for error handling. It can look like this:
// validationError.js
export class ValidationError extends Error {
constructor (message) {
super(`ValidationError: ${message}`);
this.name = 'ValidationError';
}
}
We will refer to the name
property of the instance when catching the error at the top level.
ValidationError
is a reusable component. We can throw
it at various points of our code where we don’t want Lambda to try again because of the type of error.
4.2. Throw the custom error where needed
Let’s assume that the function code is very complex, and we validate for the id
property in a separate module called logic.js
.
When the required property (id
) is not present in the entity
object, we’ll throw the ValidationError
:
// logic.js
export const transformEntity = async (entity) => {
if (!entity.id) {
throw new ValidationError('Entity cannot be processed: Id doesn\'t exist');
}
const entityAfterLogic1 = await doSomeLogic(entity);
return performSomeOtherLogic(entityAfterLogic1);
}
We can now make this module available in the Lambda handler.
4.3. Catch the error at the top level and add custom logic
Let’s see a part of the Lambda handler itself. In this case, the function gets triggered by an SQS queue:
import { transformEntity } from './logic.js';
const entityHandler = ({ body: messageBody }) => {
const entity = JSON.parse(messageBody);
try {
return transformEntity(entity)
} catch (error) {
if (error.name === 'ValidationError') {
console.log(`We won't try to process the message again. Instead, we'll send it to the dead letter queue. Error: ${error.message}`);
await sendMessageToDeadLetterQueue(dlqUrl, messageBody);
return;
}
console.log(`We will try to process the message again. Error: ${error.message}`);
throw error;
}
}
exports.handler = async (event) => {
const records = event.Records;
return Promise.all(records.map(entityHandler));
}
One way of handling messages that we don’t want to process again is to send them to a dead-letter queue. A dead-letter queue is an SQS queue, where messages are isolated and are processed specially. We use dead-letter queues to investigate why some objects fail to process.
If transformEntity
throws a ValidationError
, it means that we don’t want to process the message again because the input to the Lambda function wouldn’t be different for the second time. We need to investigate why the input object isn’t in the right format. So we catch
the error, send the message to the dead-letter queue and simply return
. If we threw an exception here, it would trigger a retry, so if we don’t want it to happen, we need to stop the function execution here.
In other cases, when the entity
object has the correct format, transformEntity
can throw an exception for a different reason. For example, something has happened in the doSomeLogic
or the performSomeOtherLogic
functions. An example is when a downstream service returns an error. In these cases, we want Lambda to try invoking the function again, so we’ll re-throw the error the try/catch
block received. A throttling or a network error might not be an issue for the second time.
The key is to separate cases when it makes sense to retry from those when it doesn’t.
5. Summary
If we are not careful and throw errors everywhere in our Lambda function, we can easily create a blocked pipeline in case of asynchronous or poll-based invocations.
Throwing custom errors and catching them at the top level is one way of selective and careful error handling that prevents the abovementioned issues from happening. This way, we can ensure that our serverless pipeline will perform at the level we want.
Thanks for reading and see you next time.