Debugging generic API Gateway errors with access logs

We can make debugging API Gateway errors easier if we enable access logging and add error messages to the default log object. This little trick will often be helpful when we receive a generic error message.

1. Problem statement

I was playing around with the Amazon API Gateway HTTP API the other day (more on that in a future post) and added a Lambda integration to one of the routes.

The route should have had the /pets/{proxy+} format.

When I invoked the API from Postman, I received a generic 500 Internal server error.

Similarly to some other AWS errors, this message wasn’t helping either.

2. Debugging process

The request didn’t invoke the Lambda integration because I haven’t seen anything in the function’s CloudWatch log group.

Something must have happened inside API Gateway (unlikely) or between the gateway and the function.

2.1. Create a log group

It’s a good idea to turn on access logging for the API, especially when we are in the development phase.

First we’ll need a log group, so let’s create one in CloudWatch to which API Gateway will send its access logs.

2.2. Turn on access logging

We can turn on access logging at the bottom of the left menu in the AWS Console.

Access logging
Access logging

After switching on Access logging with the slider, we should add the ARN of the log group we created above. Out of the available log formats, select JSON.

Log object
Log object

API Gateway will log the following object to CloudWatch:

{
  "requestId": "$context.requestId",
  "ip": "$context.identity.sourceIp",
  "requestTime": "$context.requestTime",
  "httpMethod": "$context.httpMethod",
  "routeKey": "$context.routeKey",
  "status": "$context.status",
  "protocol": "$context.protocol",
  "responseLength": "$context.responseLength"
}

These are the default logging variables.

2.3. Add error messages

This object only provides basic information, most of which we already know.

But the context object contains a lot more properties. I have found two of them particularly useful.

The $context.integration.error property returns the error message from the integration, in this case, from the Lambda function. Although the request didn’t trigger a function invocation, Lambda can still return an error. API Gateway will log that error messes here.

The second error I added is the value of the $context.error.message key, which returns an error message from the API Gateway itself.

So I ended up adding these properties to the log JSON:

{
  "integrationError": "$context.integration.error",
  "apiGatewayError": "$context.error.message"
}

Indeed, the following invocation revealed the issue.

2.4. Create the entire route in one go

apiGatewayError returned Internal Server Error, which was not new to me.

But the integrationError property showed me the real problem:

The IAM role configured on the integration or API Gateway doesn't have
permission to call the integration. Check the permissions and try again.

At first, the error message was weird because when you create a Lambda integration to an HTTP API route, API Gateway will generate the necessary permissions to invoke the function. More accurately, it will add the following policy to the Lambda function’s resource-based policy:

{
  "Effect": "Allow",
  "Principal": {
    "Service": "apigateway.amazonaws.com",
  },
  "Action": "lambda:InvokeFunction",
  "Resource": "arn:aws:lambda:us-east-1:ACCOUNT_ID:function:FUNCTION_NAME",
  "Condition": {
    "ArnLike": {
      // /{proxy+} is missing from the end!!!
      "AWS:SourceArn": "arn:aws:execute-api:us-east-1:ACCOUNT_ID:API_ID/*/*/pets"
    }
  }
}

The problem is in the Condition block: The /{proxy+} is missing from the end of the ARN. So when I invoked /pets/SOMETHING, Lambda responded with an error.

The reason was that I created the integration for the /pets route first, so API Gateway added the relevant permission to the resource-based policy. Then I realized it wouldn’t work for me this way, so I edited the route and added the /{proxy} to it. Of course, API Gateway didn’t update the permission.

The solution is to either add the missing bit to the Lambda permission manually, or create the route in its final form in API Gateway, or use Infrastructure as Code that creates and updates the permissions.

Finding the root cause was much easier with access logs. It can uncover other issues too, and saves us the time we spend finding the cause of the problem. It’s worth giving it a go!

3. Summary

API Gateway access logging can be helpful when we have to debug generic error messages.

The log object only contains a fraction of the available properties. We can add error messages to the access log by extracting them from the context object.

4. Further reading

Customizing HTTP API access logs - More available logging variables

The Missing Guide to AWS API Gateway Access Logs - Alex DeBrie’s excellent guide on access logs in REST API