Using Step Functions to handle feature flags
1. The scenario
Bob’s company has a popular application and wants to release a new feature. They want to thoroughly test it in the development environment first. But it’s a requirement to continually push changes to production, so the feature might not block the deployment pipeline.
One way to manage this problem is to use feature flags. AWS AppConfig, part of the Systems Manager ecosystem, is a service that allows us to apply feature flags and configuration objects into our application.
One way to incorporate them into the code is to use conditional statements that check whether we have enabled the feature flag for the given environment. But because Bob wants to minimize code changes and reduce code complexity (and because it’s fun), he decided to use Step Functions instead of if
statements.
He created a separate Lambda function with the new feature (NewFeature
), which exists parallel to the existing code (ExistingFeature
).
Let’s see how this experiment worked out.
2. AppConfig concepts
When getting a feature flag from AppConfig, we must provide three parameters.
An application
is a namespace or a folder that contains configurations, feature flags and environments for the given application.
The environment
is the target for the feature flag. We can name it as we like. In this example, we’ll have two environments, dev
and prod
. We enable the feature flag in dev
, which runs the new code. We keep the existing code in prod
.
The last element is the configuration profile
, which can be feature flag or freeform configuration. This example will use a feature flag.
I won’t describe how to create applications, environments and configuration profiles in AppConfig. I’ll provide a link that explains the process at the end of the post.
3. Getting the feature flag
First, we fetch the feature flag state (enabled or disabled) for the given environment from AppConfig.
3.1. Lambda extension
Luckily, we (and Bob) build serverless applications and use Lambda functions. AWS provides an extension that we can integrate with our function as a layer.
If we use SAM templates to create the resources, we can add the extension like this:
AppConfigFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: nodejs20.x
Layers:
- 'arn:aws:lambda:eu-central-1:066940009817:layer:AWS-AppConfig-Extension-Arm64:49'
# more Lambda properties
The URL is different for each region and Lambda function architecture, so you need to find the right one for your scenario.
3.2. The code
We can now call the extension from the GetFeatureFlag
function. The code can look like this:
import axios from 'axios';
const {
AWS_APPCONFIG_EXTENSION_HTTP_PORT,
APPCONFIG_APPLICATION_NAME,
APPCONFIG_ENVIRONMENT_NAME,
APPCONFIG_CONFIGURATION_NAME,
} = process.env;
const client = axios.create({
baseURL: `http://localhost:${AWS_APPCONFIG_EXTENSION_HTTP_PORT}`,
timeout: 5000,
});
export const handler = async () => {
try {
// 1. Fetch the feature flag from AppConfig
const config = await client.get(
`/applications/${APPCONFIG_APPLICATION_NAME}/environments/
${APPCONFIG_ENVIRONMENT_NAME}/configurations/
${APPCONFIG_CONFIGURATION_NAME}`,
);
// 2. Return the feature flag as the value of the config property
return {
config: config.data,
};
} catch (error) {
throw error;
}
};
AWS_APPCONFIG_EXTENSION_HTTP_PORT
defaults to 2772, which we can leave as is.
We can have an environment variable for each mandatory AppConfig parameter, application, environment (dev
or prod
in this case) and configuration profile (1). This way, when we deploy the resources to multiple environments, the function will know the feature flag state for the given environment.
The function’s return value will be similar to the following:
{
"isAllowed": {
"enabled": true
}
}
As we can see, AppConfig returns an object of feature flag objects. isAllowed
is the feature flag’s very creative name. The presented value refers to the dev
environment because the flag is enabled there. The value would be enabled: false
in prod
. We encapsulate the feature flag value in the config
property of the returned object (2).
3.3. Permissions
The function’s execution role must allow the appconfig:StartConfigurationSession
and appconfig:GetLatestConfiguration
permissions.
4. Using Step Functions
GetFeatureFlag
is part of the state machine, so its return value (the feature flag name and its state) will be the input of the next state.
In this case, it’s a Choice
state, where we decide if we call the existing function or the one with the new feature.
The state’s definition can look like this:
"IsFeatureFlagEnabled": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.config.isAllowed.enabled",
"BooleanEquals": true,
"Next": "NewFeature"
}
],
"Default": "ExistingFeature"
}
When the feature flag’s value is enabled: true
, Step Functions will call the NewFeature
function. Otherwise, it will invoke ExistingFeature
. From this point, the flow can continue as usual.
We have successfully eliminated the if
block from the code!
5. AppConfig interactions with Step Functions
What if we wanted to remove the GetFeatureFlag
Lambda function and make Step Functions directly interact with AppConfig? We can do that, but there are some considerations to take.
5.1. What’s going on in the background?
With a few lines of code in the function handler (1), the Lambda AppConfig extension does a complex job in the background.
First, it calls the StartConfigurationSession
API endpoint, which sends back an InitialConfigurationToken
. Then, it invokes the GetLatestConfiguration
endpoint, which returns the feature flag object seen above.
It then calls GetLatestConfiguration
at a configured interval (defaults to 60 seconds) and caches the result.
5.2. Doing the same with Step Functions
We can remove this Lambda function from the architecture and delegate the AppConfig API calls to Step Functions. But in this case, we have to manage everything that the AppConfig extension does for us.
The above workflow snippet shows the change only. The Choice
state and everything after will remain the same.
Step Functions integrates with 10,000+ AWS APIs, including StartConfigurationSession
and GetLatestConfiguration
.
The StartConfigurationSession
state requires the mandatory AppConfig parameters we used in the HTTP call inside the Lambda handler. The state’s API parameters section can look like this:
{
"ApplicationIdentifier.$": "$.ApplicationIdentifier",
"ConfigurationProfileIdentifier.$": "$.ConfigurationProfileIdentifier",
"EnvironmentIdentifier.$": "$.EnvironmentIdentifier"
}
We assume the state’s input contains the ApplicationIdentifier
, ConfigurationProfileIdentifier
and EnvironmentIdentifier
properties.
The state’s output (InitialConfigurationToken
) will be the input of the following state, GetLatestConfiguration
. This state needs one mandatory parameter called ConfigurationToken
. The relevant part of the definition can look like this:
{
"ConfigurationToken.$": "$.InitialConfigurationToken"
}
The output will be similar to this:
{
"Configuration": "{\"isAllowed\":{\"enabled\":true}}",
"ContentType": "application/json",
"NextPollConfigurationToken": "TOKEN",
"NextPollIntervalInSeconds": 60
}
As we can see, the Configuration
property contains the feature flag as expected.
5.3. It might not be a good idea
But there’s something else here.
The GetLatestConfiguration
call returns a token in the NextPollConfigurationToken
property. AWS recommends that clients use it for subsequent calls to the endpoint.
The documentation also recommends caching the feature flag instead of continually fetching it from AppConfig. We should take this advice because AWS charges after the GetLatestConfiguration
calls. So we want to reduce the number of invocations!
It means that the client that calls the state machine should also provide the current token in the input. The first state could check if the request contains the token. In this case, the state machine could jump to the GetLatestConfiguration
state. If the client can’t provide the token (for example, because it’s the first call), the state machine could call StartConfigurationSession
.
Alternatively, the state machine could store the token somewhere externally, for example, in a DynamoDB table. But this solution would add at least two extra API calls (read and update token) to the flow.
All of these would increase complexity. For this reason, I would keep the Lambda function with the AppConfig extension.
6. Considerations
It’s not only feature flags that we can configure in AppConfig. It’s possible to store more complex configuration objects, too.
As said above, we can have multiple feature flags for the same application and environment. If this is the case, we’ll need a more complex Choice
state configuration, which can lead to harder-to-manage states. Alternatively, Bob can write multiple if
statements in the code, one for each feature flag.
7. Summary
AppConfig can store feature flags and other configurations we can use in our applications. With the help of the AppConfig Agent or the Lambda extension, we can fetch the feature flag from AppConfig. The extension follows the AWS-recommended flow of API calls and caches the feature flag.
We can use Step Functions and incorporate different code versions based on the feature flag value into our application.
8. Further reading
Creating feature flags and free form configuration data in AWS AppConfig - Guide to create applications, environments and configuration profiles
AWS AppConfig workshop - Get your hands dirty
Getting started with Lambda - How to create a Lambda function
Input and Output Processing in Step Functions - Data flow manipulation