4 min read

Going Serverless: How and Why to Fetch Configuration Data from SSM

The serverless framework is a popular development tool to allow developers to quickly and easily create applications without worrying about infrastructure and other network management.

Enterprise organisations typically develop multiple serverless applications in a "microservices" approach, combining them to create a unified product. For example, an online store might have a handful of applications that come together to form one system:

  1. A customer-facing, frontend website displaying various products for purchase
  2. A backend "products" serverless application for managing products, tags, variants, prices and similar information
  3. A backend "checkout" serverless application for managing shopping baskets, and payment gateway integrations
  4. A backend "shipping" serverless application for managing product shipments, delivery service integrations and notifications

Indeed, it is possible to create one extensive backend system that manages all of the products, checkout and shipping concerns in the above example. However, this approach would lose all of the benefits of iteratively developing smaller pieces of software independently in focussed and specialised teams.

In a scenario like the one above, it is typical for each of the serverless applications to share most of the serverless configuration, including:

  • Security groups and subnets
  • RBAC configurations (e.g. for API Gateways)
  • IAM user names or shared policies
  • Database connection (e.g. AWS Neptune connection strings)
  • Various custom environment variables at your organisation

In particular, it is typical for "network infrastructure" like the VPC to be created outside of the serverless projects entirely (e.g. with Terraform) and used by the serverless projects.

So what is the best way to get externally-created, shared configuration data into our multiple serverless projects without tedious repetition and the ability for human error?

Hardcoding Configuration

The first approach is to hardcode everything, everywhere, which has the benefit of being the easiest and fastest to implement but the downside of not scaling to multiple deployment environments for one project or numerous projects very well.

For example, our serverless.yml file would have the following snippet for VPC configuration:

...
provider:
  name: aws
  vpc:
    securityGroupIds:
      - sg-0123456789
    subnetIds:
      - subnet-0000000000
...

Of course, this would not work for a dev deployment, test deployment and live deployment, as the ID values would change between these environments.

Nonetheless, this would be entirely acceptable for a proof-of-concept project or the first iteration of a system.

CI/CD Variables

With CI/CD systems like GitLab, you would typically refactor the above code into:

...
provider:
  name: aws
  vpc:
    securityGroupIds:
      - $SECURITY_GROUP_ID_1
    subnetIds:
      - $SUBNET_ID_1
...

This would allow the CI/CD provider to inject the correct values per environment during the deployment.

For example, with GitLab, you would have a .gitlab-ci.yml file with the following structure:

image: node:latest

stages:
  - deploy

.deploy: &deploy
  - export SECURITY_GROUP_ID_1=$SECURITY_GROUP_ID_1 # from GitLab CI/CD variables
  - export SUBNET_ID_1=$SUBNET_ID_1 # from GitLab CI/CD variables
  - yarn install --frozen-lockfile
  - yarn global add serverless
  - serverless deploy --stage $CI_ENVIRONMENT_NAME --verbose

deploy dev:
  stage: deploy
  script:
    - *deploy
  only:
    - master
  environment:
    name: dev

deploy test:
  stage: deploy
  script:
    - *deploy
  only:
    - tags
  environment:
    name: test

deploy live:
  stage: deploy
  script:
    - *deploy
  when: manual
  only:
    - tags
  environment:
    name: live

With this setup, GitLab CI/CD injects the security group and subnet information, and the values change based on the environment.name.

However, you will more than likely have you manually copy and paste multiple configuration values into your CI/CD setup for this to work effectively across multiple projects. You can use group-scoped (shared) environment variables in GitLab, but this will vary between CI/CD providers and plans. Depending on the configuration being injected, it is also error-prone and potentially prone to sudden change and failure.

Fetching Values from Parameter Store

My preferred approach is to avoid as many manual and non-code steps as possible. Ideally, my project should also be deployable no matter where the serverless deploy command is executed. For example, I like being able to do deployments locally, from time to time, without needing to look up and set up these variables on my machine, especially for debugging purposes.

This is why I opt to use AWS Systems Manager Parameter Store to store my configuration data, which can be used to fetch any values needed at the time of each serverless deploy.

These two benefits of automation and the ability to run commands deployments from anywhere come from the following serverless.yml configuration:

...
provider:
  name: aws
  vpc:
    securityGroupIds:
      - ${ssm(aws:region):/${opt:stage}-serverless-security-group}
    subnetIds:
      - ${ssm(aws:region):/${opt:stage}-vpc-subnet-1}
...

With this setup, the security group ID and subnet ID values are fetched from the AWS storage for each deployment and are always guaranteed to be the latest and correct value. For VPC configuration, it is feasible that an administrator would like to move workloads between subnets or amend security groups from time to time and update the Parameter Store values, so subsequent serverless deployments simply work without any other intervention.

Note: The aws:region is provided by the framework, and the stage is interpolated to allow for a different value per environment to be fetched from AWS.

Automation with Terraform

When Terraform creates networking, infrastructure and other resources, we should extend the Terraform project to add the relevant outputs into Parameter Store using the aws_ssm_parameter resource, as shown in the following example:

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"
  # various configuration options
}

resource "aws_security_group" "serverless-sg" {
  name   = "serverless-sg"
  vpc_id = module.vpc.vpc_id
  # various configuration options
}

resource "aws_ssm_parameter" "vpc-subnet-1" {
  name  = "dev-vpc-subnet-1"
  type  = "String"
  value = module.vpc.private_subnets[0]
}

resource "aws_ssm_parameter" "serverless-security-group" {
  depends_on = [aws_security_group.serverless-sg]
  name       = "dev-serverless-security-group"
  type       = "String"
  value      = aws_security_group.serverless-sg.id
}

This structure ensures any terraform apply saves the required information directly into the AWS backend for our serverless availability and usage.

Considerations

This convention to store configuration in the Parameter Store is best applied when:

  1. Multiple serverless projects use the same information (e.g. VPC configuration, database connection strings, etc.)
  2. The configuration values could feasibly change in the future, and you want to avoid the manual and tedious effort to apply value updates to your deployment users and systems.
  3. You are not storing secrets this way. Instead, secrets should be fetched using AWS SSM or similar technologies at runtime.

Other noteworthy comments:

  1. The cost for fetching data from AWS per deployment should be negligible
  2. A good convention is to prefix configuration with the environment name
  3. The deployment user will need the appropriate permissions to fetch values from AWS Parameter Store. A good convention could be to allow certain users access to all dev* parameters but not to live* parameters, depending on your organisation
  4. It is possible to save a StringList to the Parameter Store rather than multiple, individual values for arrays

You will thank yourself for fetching configuration this way at any organisation with even a tiny number of serverless applications to manage.