Migrating 42 Servers to AWS: Key Lessons Learned | CloudNation

Sebastiaan Brozius Cloud Consultant

Publish date: 23 June 2023

Introduction

Earlier this year, I had the opportunity to work on a project involving the migration of 42 servers from a data center to Amazon Web Services (AWS), as part of an AWS Migration Acceleration Program (MAP) deal. The project carried some pressure, as the contract with the data center was scheduled to end in approximately 2 months, and the customer was adamant about not renewing or extending the existing contract.

Throughout this project, I gained some valuable insights that I believe are worth sharing. By doing so, I hope to help you save time and avoid potential frustrations when faced with a similar undertaking.

Disclaimer: this is not meant to be an exhaustive migration guide. Instead, I aim to provide you with some practical information based on my own experiences. Feel free to take from these Lessons Learned what is applicable to your situation and tailor it to create a migration plan that suits your needs.

The tools used

For this project, the following tools were used:

AWS Migration Hub
AWS Application Migration Service
Terraform (or any IaC tool of your choice)
Scripting-language of your choice (I used PowerShell)
Bash-scripting (the servers were running Linux)

Step One: Taking Inventory

In order to have a comprehensive understanding of the task at hand, it’s important to take inventory of the environment you will be migrating, along with all the components it relies on or interacts with. This provides a clear overview, enabling you to identify potential issues, plan ahead, and establish a realistic timeframe for the migration process.

Network / subnets

Gather information about the existing networks and subnets utilized by the servers. Determine whether the servers are configured with , are the servers using static or dynamic IP addresses. Additionally, determine the presence of any VPN connections that need to be considered. Take into account any inbound and outbound allow-listings, along with identifying any public IP addresses currently in use, if applicable. The customer might even have their own public IP range which they want to (partially) move to AWS.Firewall rules

Try to get a comprehensive overview of all firewall rules in place. This helps in determining the appropriate accessibility and connectivity requirements for each server, such as identifying which servers should be accessible from specific locations and which servers need to establish connections to specific destinations. Conducting this analysis can help identify any existing issues within the current setup, allowing you to address and mitigate them effectively within the new configuration.

Traffic flows between servers

The firewall rules might be of use for that, but traffic between private subnets is often unrestricted. Understanding the traffic flows between servers will help you set up more restrictive (and safer) security groups within AWS.

DNS domains

Does the customer want to move DNS domains to AWS? Are there DNS-records that point to the servers and need to be changed? If there are changes to be made to DNS records and/or domains, careful planning needs to be done to make sure that during the actual migration, you don’t encounter significant delays, waiting hours or days before a change has been propagated throughout the internet.

Certificates

Do the servers or any applications running on them use any certificates, and if so, how are they managed? Understanding this can help determining the best approach for exposing an application to the internet. For instance, if the certificate is managed through an automated system that runs on the server itself, using an Application Load Balancer may not be feasible. In such cases, you might need to resort to options like a Network Load Balancer, connecting the server directly to the Internet (which is generally considered bad practice), or change how the certificate is managed, potentially resulting in significant impact on the required tasks.

Backup RTO and RPO

In order to set up the new environment, it’s important to understand what RPO and RTO the customer requires and for which service / application / server. Setting up AWS Backup in advance makes it easier to enable it during or just after the migration.

Software license requirements

Some software vendors may bind their licenses to the MAC-address of a server.. If that’s the case, additional steps must be taken to avoid needing to constantly update the license registration with such vendors.

OS versions being used and patch level

Before you start working on preparing for the migration, it’s wise to ascertain the specific operating systems (OSes) in use, and their respective versions. When encountering older OS versions, there might be more work involved in installing the necessary agents, or, in some cases, it might even be impossible. Additionally, understanding the current patch-level of the OSes and how patching is managed is important to know for the new environment in AWS.

Software used

Specific products and/or versions are eligible for additional discounts in AWS MAP, like commercial databases, SAP and more.

Step Two: Make a plan

Once you’ve acquired most of the information, it’s time to start making a plan.

Make an IP-plan for the new environment

You won’t always be able to keep the current IP-addresses, and creating a new IP-plan is important for setting up the new network and subnets, with proper sizing.

Determine which security groups to create

Make sure you have a clear understanding of the necessary security groups to allow traffic between servers, enabling the required inbound and outbound traffic. Identify which security groups need to be created and attached to which servers.

Make sure you have your MAP tag number (MPE ID)

This is necessary when deploying resources to get the discount. Identify how you will apply the tags and what resources might need alternative tags.

Determine the use of a launch template (highly recommended!)

AWS Application Migration Service utilizes launch templates. Deciding whether you will be utilizing them and determining which specific settings you want to configure within the templates can provide valuable insights and help gather any additional necessary information.

Determine the order of migration

Create an initial migration order, and continuously validate and refine until the actual migration.

Step Three: Preparations

Once you have created an initial plan, it’s time to start preparing.

Set up the the management account listed in the Migration Plan and activate the Cost Allocation Tag required for MAP 2.0. This information should be made available to you by your AWS representative or AWS partner for the MAP deal.
Set up AWS Migration Hub and either install the Discovery Agent on the servers you’re migrating, or, if you have access to the hypervisor layer, install the appliance. More info on these can be found in the AWS Documentation for AWS Migration Hub
Proceed by deploying the infrastructure to which you intend to migrate the servers. This involves setting up the necessary components, configuring an initial security group that will be assigned to the launch template, and an Instance Profile with the appropriate permissions. Next, create VPC Endpoints for the services SSM uses; this way you should be able to access the servers even when they cannot connect to the internet (which is likely the case during the test phases).

Make sure you tag everything with the map-migrated tag, also the infrastructure. With Terraform, you can set this up using the AWS provider parameter default_tags. More information on the exact tag value should be available through the AWS MAP channel.

Once you set up the basic infrastructure, initialise AWS Application Migration Service. During the initialisation, set up the default launch template. Make sure the MAP tag is added with the appropriate value, the security group, instance profile, network subnet, et cetera. Paying attention to these details will contribute to a well-prepared and robust launch template for the migration process.
After AWS Application Migration Service has been initialised, the Replication Agent can be installed on the servers to be migrated. The replication agent uses TCP port 1500 to connect to the AWS Application Migration Service, so make sure any firewall allows TCP port 1500 outbound for the source servers.

NB: The Replication Agent requires the Linux headers to install. For older Linux-versions this could mean you have to locate the Linux headers for the specific release, since they might no longer be available through the distribution’s update manager.

Install the AWS SSM Agent on the source servers. This enables you to connect to the server through the AWS Console using Session Manager, or even using the Session Manager Plugin for the AWS CLI. During testing and the actual migration, this can prove useful when you’re running into issues with any server.
After initiating the server replication process, we must wait for the replication to complete.

The duration for the initial replication to complete depends on several factors, including the speed of the internet connection, the total volume of data to be replicated, and the frequency of changes to the source system's filesystem. Therefore, the time required for the initial replication may vary based on these factors.

The AWS Application Migration Service console gives an estimate of the time required to complete replication, which is constantly updated.

Step four: Test, test, test!

When all (or at least the ones you want to start with) servers have replicated, you can start testing.

In the AWS Application Migration Service console, select one or more servers to test, and launch test instances.

Make sure that the test-servers are unable to connect to live servers, so they do not contaminate any production environment.
Create a migration-script per server. Perform multiple test-runs to check and improve the migration-scripts.
During testing, you might encounter software that can throw a wrench in the migration, like corosync and pacemaker. When you encounter such software, determine if you still need it and take appropriate action to mitigate any potential issues that might arise by keeping those configurations as they are.Evaluate if your intended order of migration is valid. During testing you might discover a different order is needed.
Create waves based on the order of migration for a simpler orchestration during the actual migration.

Perform at least one full test-migration. This aids in estimating the overall time required for the complete migration process. This information is crucial for planning the anticipated downtime, which needs to be communicated to the customer and any application users. Additionally, it helps in determining the optimal timing for the actual migration, the number of personnel performing the migration, and when test personnel should begin testing the application after the migration. By considering these factors, you can effectively coordinate the various aspects of the migration process and ensure a smooth transition for all stakeholders involved.
If you are transitioning a server from being directly exposed to the internet to being protected by a load balancer, it is crucial to thoroughly test the load balancer configuration. Make every effort to conduct comprehensive testing of the load balancer setup to ensure its effectiveness and functionality in redirecting traffic appropriately.

Once you finished testing a source server, mark it as ‘Ready for cut-over’ in AWS Application Migration Service.
If there are DNS changes to be made, prepare for them; lower TTL values for records that need to be changed, and prepare any domain that needs to be moved to Route53, or even move them in advance if possible.

Step Five: The Real Deal

This is what you’ve been testing for!

Shut down any running services on the live servers, especially databases, and wait for the last changes to be replicated to AWS.
Start migrating in waves.
Make sure your security groups have the proper access (they should at least be reachable for the group of test-users)
Have your test-group test as early as possible and have a select group of people report on any findings. Triage what needs to be fixed right away, and what can wait. Have product owners participate in this where possible.
Mark servers that have been given the green light as ‘Finalize cut-over’ in AWS Application Migration Service to indicate they’re finished.

Turn on VPC Flow Logs to help troubleshoot any network-issues during the migration.

Step Six: The Aftermath

Once the migration has been finished successfully, there’s a few more things that need to be done.

Turn off the old servers, or at the very least make sure that the applications will not be enabled again.
Make sure the servers and services are backed up in AWS.
Mark the migrated servers as ‘Mark as archived’ in AWS Application Migration Service.
Remove any software from the servers that was specifically needed for the data center architecture (e.g. VMware tool, Azure tools)

Attention Points

During both testing and the actual migration, when launching multiple (bigger) instances at the same time, one or more instances might respond badly/have weird issues. In that case, stop the instance(s) in the AWS Console, wait a minute or two, and start it up again. The reason for this is that the underlying host has issues allocating the proper resources to the instance. Stopping the instance and starting it again relocates the instance to a host that has sufficient resources available for the instance.If you’re using user_data in your launch template(s), this will only be run when the server has a working network connection. If a server has no working network connection, user_data cannot be retrieved from the instance metadata and consequently cannot be run.

It is crucial to ensure the customer actively tests the application(s) during the migration process and provides their final approval. Ultimately, it is the customer's responsibility to validate whether the application is functioning as intended and if all data has been accurately transferred. Their involvement is essential to confirm the successful outcome of the migration and ensure that the application meets their requirements and expectations.