Streamline Your AWS Infrastructure: The Right Infra as Code Tool in Three Easy Steps

Luuk Rutten Principal Cloud Engineer

Publicatiedatum: 19 juli 2023

Introduction

You're currently running workloads in the cloud and aiming to improve reliability through the implementation of infrastructure as code. That's fantastic! Now, the question arises: which tool should you opt for, and why? Finding comprehensive guidance online can be quite challenging, as only a limited number of developers possess extensive expertise in this technology, and even fewer have hands-on experience with various tools in a production environment.

As an AWS consultant, along with my colleagues, we have thoroughly tested numerous tools in real-world work scenarios. Through these experiences, we gained valuable insights into the factors that hindered our progress, the aspects that were truly remarkable, and the ones that initially appeared significant but proved otherwise. Thus, in this blog, I want to help you select the most suitable tool by sharing the key lessons we learned.

By following three simple steps, I hope to provide you with a clearer understanding of which tool aligns best with your requirements.

Step 1: Should I (not) use CloudFormation?

Are you considering CloudFormation, or a derivative (including CDK, SAM, Serverless Framework and more)? Based on my personal experience, I would discourage the use of CloudFormation-based tools if you find yourself in any of the following situations:

1. You generate drift from infrastructure templates

Typically, there are two major contributors to drift:

Application deployments cause infrastructural drift
Some application deployment processes update AWS infrastructure. For example, the ECS update-service API can be used for both infrastructure configuration and application deployment. CloudFormation cannot easily deal with this split responsibility as updating the ECS Service in CloudFormation will trigger a complete deployment of an (old) container image.

Operational processes are managed outside of the infra as code
We sometimes need to update our RDS database versions, make manual fixes, or do database restores and don't want to do this using infrastructure as code. Operations are automated - just not using the infrastructure as code tool. If you encounter such a scenario, CloudFormation proves to be rather inadequate in handling this type of drift, and the process of re-importing these changes can turn into a nightmare.

2. You are in a non-cloud-native ecosystem (open-source and any third parties)
If want to configure CloudFlare, Kubernetes or other clouds using the same toolset: Cloudformation is not your friend. Yes, CDK has custom resources for Kubernetes, but NO, I would not recommend them as they make things unnecessarily complex.

You might wonder why I only exclude CloudFormation here: CloudFormation is the only tool with limitations strong enough to let me disregard them from the start. Other tools also have their downsides, but these are not strong enough to not use them.

Step 2: Should I (not) use a programming language?

Just using a programming language for a tool doesn't automatically make it superior. Let's consider an example: Suppose the Cloud Development Kit (CDK) only generated basic CloudFormation resources (known as L1 constructs in CDK terminology), and you had to create all abstractions from scratch. Would it truly be an improvement over regular CloudFormation? While you would gain the advantage of type safety and are able to create a for-loop, you would also have to deal with tsconfig settings, more complex dependency management, and intricate version compatibility. In this case, you might actually be better off using Terraform's HCL language.

All in all, these so-called 'advantages' should only be seen as minor benefits for most situations. Instead, it is advisable to first consider the following aspects and then evaluate whether using a programming language still makes sense.

Step 3: Consider the things that matter

So, what should you consider? The tool should offer sufficient benefits to outweigh the increase in complexity. My primary criteria are: it should enable fast and reliable infrastructure development, both presently and in the future. Security is a prerequisite that must be met before even considering a tool. Here are a few factors that you can use as benchmarks, keeping these requirements in mind.

Assign a score (on a scale of 1-10) to your tool for each factor listed in each category.
I've provided my own opinion on the importance of each factor. Multiply AWESOME benefits by 3, GREAT benefits by 1.5, MEDIUM benefits by 1, and MINOR benefits by 0.5. Feel free to adjust these factors according to your specific situation.
Add up the points to determine the score in each category.

Compare the results for different tools.

A template scorecard is provided at the bottom of the list as well.

Development speed

Managed Abstraction: AWESOME benefit
The CDK team is focusing a lot on providing an abstraction for a wide set of use cases that are available out of the box! It makes AWS IAM and network connectivity a lot easier! But did you know SAM and Serverless framework also provides abstractions purpose-built for your serverless projects? It does not include wild abstractions like the

ApplicationLoadBalancedFargateService class of CDK, but for serverless applications it is often exactly what you are looking for!

Pulumi Crosswalk also has some nice abstractions, but aren't as extensive as CDK's.

Fail & debug fast: AWESOME benefit

All tools have a way to make development easier and detecting mistakes early. All of terraform validate / plan, cdk watch, pulumi preview , and various IDE plugins can help you spot mistakes early. Clear errors and logging greatly help with debugging as well.

Application deployment: AWESOME benefit

CDK is good at building lambda's and containers and deploying them right away! Other frameworks like serverless framework and SAM are also pretty good. It is one of my favorite infrastructure as code features because it makes it so easy to do an end-to-end deployment of an application from scratch.

Shallow learning curve: AWESOME Benefit

AWS SAM and the Serverless framework are SUPEREASY to get started with. It's YAML with plugins. Other tools require more thought. CDK especially requires you to learn CDK best practices, Cloudformation for debugging, programming language best practices. This is a lot to take in from the start. The same applies to terraform wrapper-tools like terragrunt. By introducing them, you have to learn more tools to get started.

Language flexibility: MEDIUM benefit

All tools have features or plugins built in to deal with common problems like loops or string or array manipulation. Although sometimes annoying, it is rare to find limitations in language capabilities that have a massive velocity impact.

Ecosystem integration: AWESOME benefit

Each tool is part of an ecosystem. For example, CDK patterns libraries for quick-starting your project. But also think about static code analysis (e.g. CheckOV), or available CI/CD integrations (GitHub actions, CircleCI). Check out the awesome-cdk and awesome-terraform repositories to get an initial impression of their ecosystem.

It saves you a LOT of time if you don't have to integrate tools yourself.

Purpose-built benefits and limitations: AWESOME benefits (hopefully)

Depending on your specific use case, it is important to choose a purpose-built tool that caters to your needs. These tools offer unique benefits that are exclusive to them. Let's take the Serverless Stack (SST) as an example. SST has invested significant effort in enhancing the overall experience for full-stack developers. It provides features like local lambda debugging, increased abstraction, and type-safety for both front-end and back-end development, making it an excellent choice for serverless full-stack development.

On the flip side, if a purpose-built tool limits your capabilities for your specific use case, it should receive a lower score. For instance, when it comes to CDK, it may not excel in multi-account state sharing. In the case of SST, the only supported relational database is RDS Aurora V1, which has gained a reputation for its slow scalability.

Reliable deployments

Type safety: MINOR benefit

Type checking can offer early detection of type errors in your infrastructure. However, it's crucial to remember that you also need to create the appropriate type definitions and interfaces for your own classes. Without doing so, or if it's not deemed important in your specific scenario, the benefits of typing can diminish.

Another commonly overlooked issue is the challenge of sharing these types with your actual application, especially when they may be utilizing different runtime versions or bundlers. While it is indeed possible, it requires a willingness to invest the necessary effort to ensure compatibility and proper integration.

Good preview function: GREAT benefit

This is where all CloudFormation tools in my opinion are still lacking a bit.

(Continuous) corrective changes: MEDIUM benefit

Certain tools have the capability to detect drift and make efforts to bring the system back to its desired state when changes are applied. In fact, Crossplane takes it a step further by continuously detecting and addressing such drift. This proactive approach of identifying changes occurring outside the tool's known state significantly enhances the reliability of deployments.

End-to-end infra testing: This would be an AWESOME benefit (:cries-in-custom-scripting:)

However, no IaC tool currently supports a good test framework. I would love for example a jest or pytest-based package that integrates with any IaC tool and abstracts common test cases... Just imagine a world where I could do assertions on my infra as code directly...
Unfortunately, I can only dream about this now :(.

Maintainability
For me, maintainability refers to the level of effort required to ensure that development speed and reliability are sustained over time.

Evolvable codebase: AWESOME benefit

As your application evolves and expands, it's natural for names to become less descriptive and files to become larger and harder to read. Both Terraform and Pulumi do a commendable job of reorganizing files with minimal disruption. In repositories based on CloudFormation, however, restructuring is only feasible within the confines of the same CloudFormation stack. However, logical renames are a no-go.

Modularity (self-managed abstraction): Great benefit

Breaking down the codebase into modular components or functions promotes code reuse, improves readability, and makes it easier to understand and maintain specific parts of the system without affecting others. Terraform provides modules, Crossplane has Composite Resources, Pulumi has its ComponentResource and CDK uses constructs to provide this modularity. One piece of advice: DON'T OVERDO THIS! It's a neat feature, but if you need more than 2 layers of self-managed abstraction for your application, chances are you're doing it wrong.

Little housekeeping: Great benefit

Not having to deal with (major) package upgrades, provider updates, and state management is great.

Scorecard template
You can utilize this scorecard template to gain a comprehensive overview. By conducting focused research within each category, you can achieve a reasonably accurate comparison among different tools.

WORKLOAD DESCRIPTION:

CATEGORY	IMPORTANCE (0-5)	TOOL SCORE (0-10)	WEIGHTED SCORE
Managed abstraction	(3)
Fail & debug fast	(3)
Application deployment	(3)
Shallow learning curve	(3)
Language flexibility	(1)
Ecosystem integration	(3)
Purpose-built benefits and limitations	(3)
TOTAL VELOCITY
Type safety	(0.5)
Preview changes	(3)
Corrective changes	(3)
End-to-end infra testing	(3)
TOTAL RELIABILITY
Evolvable codebase	(3)
Modularity (self-managed abstraction)	(1.5)
Little housekeeping	(1.5)
TOTAL MAINTAINABILITY
TOTAL

General recommendations

Choosing multiple tools

It is perfectly acceptable to select multiple tools for different components of your application landscape. The key factor here is choosing purpose-built tools. The simpler a tool is to learn, the more freedom you have in selecting different tools. However, it is advisable to avoid opting for multiple tools with steep learning curves.

Common example scenarios

These would be my general pieces of advice for some common scenarios.

You migrate a workload to AWS. It contains a mix of EC2, containers and you are not particularly preferring any ecosystem: If you're migrating a workload to AWS that consists of a combination of EC2 instances and containers, and you don't have a strong preference for any particular ecosystem, my recommendation would be to use Terraform. It offers significant benefits through its robust ecosystem and relatively easy learning curve. While it can be verbose, especially when dealing with IAM policies and Security Groups, if you're not heavily reliant on cloud-native technologies, this should not pose a major issue. Considering that you're migrating and likely have some existing processes in place causing infrastructure drift, Terraform would be an excellent choice for this use case.

Pro-tip: When starting out, it's advisable not to go overboard with Terraform wrapper tools. Surprisingly, you can achieve significant scalability without relying heavily on these wrappers. Implement only what you foresee needing in the near future.

It's worth mentioning that Pulumi is also a great option, but it does have a slightly smaller available ecosystem in terms of integrations with common CICD tools and static code analysis. Moreover, based on my experience, Pulumi has a steeper learning curve.

If you're running Kubernetes in AWS and have a limited need for cloud-native tooling.

Using Terraform for infrastructure configuration would still be a reliable choice. Pulumi is also an option, but often its programming language benefits are unnecessary for my requirements.

When it comes to Kubernetes configuration, it might be more beneficial to utilize a separate tool. This is because the limited managed abstractions provided by cloud-native tooling can hinder development speed. For infrastructure configuration, Crossplane is a relatively new option. If its unique benefits, such as continuous corrections on infrastructure state, address a significant problem for your organization, feel free to choose it. However, if these benefits are merely 'nice to haves', I would recommend opting for a technology with a longer track record, such as Terraform.

You are developing a cloud-native container solution on AWS

If you're using ECS, the choice of tooling depends on your approach. In the case of mono-repo approaches, CDK is a good option thanks to its bundling capabilities. However, if you have split responsibilities, with one repository dedicated to infrastructure and another for application deployments, I would recommend using a different tooling. This is because application deployments can lead to infrastructure drift, which can be better managed with other tools specifically designed for such scenarios.

You are a data engineer developing a cloud-native ETL flow on AWS
If you work with services like Glue, Lambda, S3, Lake Formation, Athena, then CDK is a great tool. Its versatility and abstraction will save you lots of time. Also, infrastructural drift is usually less of an issue in these scenarios.

You are developing a serverless solution on AWS
Go for the simplest tool that (a) supports your use cases, (b) provides a lot of abstraction and (c) fits well within your ecosystem. For example:

A simple small stateless API can easily be done with SAM CLI or serverless framework. If you are using Python extensively, chalice might even be a better fit for your tooling.

A low-volume cloud-native web app is a perfect use-case for serverless stack (SST) Also works for medium-high volume apps if you work around the RDS Aurora V1 scaling.

Conclusion

In conclusion, utilizing the provided template scorecard and customizing the weights according to your specific requirements, I believe you will be able to make a well-informed decision regarding the choice of infrastructure as code tool. This scorecard serves as a valuable tool for evaluating and comparing the different options based on the factors that are important to you. By assigning scores and adding them up in each category, you can obtain a comprehensive overview of how each tool performs in various aspects.

Remember to consider the benefits and limitations of each tool, along with their ecosystem, ease of learning, maintainability, and compatibility with your application landscape. Additionally, take into account factors such as security, drift detection, restructuring capabilities, and the level of support for your desired features.

By utilizing the scorecard and conducting targeted research within each category, you can gain a clearer understanding of the strengths and weaknesses of different tools and make a more informed decision. Selecting the right infrastructure as code tool will greatly contribute to the efficiency, reliability, and scalability of your application's infrastructure.