AWS single host resilience with autoscaling groups
Make even a single AWS EC2 host highly available using autoscaling groups. Here's example using ASG, EFS for persistent data, and Terraform for easy automation.
Although this example is about AWS, other cloud providers can have similar concepts for you to use. My goal for this exercise was to create a single host for some remote work, have it resilient against zone outages, and have persistent data if the host is recreated.
Key Concepts
- Autoscaling is a way to duplication VMs on demand.
- EFS Hint: It's NFS
- Terraform Is way to build your infrastructure with code. Hint: Like CloudFormation but better
- Terraform Modules used in this example.
Code for EFS
EFS, to spite Amazon's awful naming schemes, is simply a managed NFS service. You define a file system, and then a mount target (exports for NFS veterans) in any subnet where your EC2 instance may reside.
EFS works on TCP port 2049. Configure your security groups as required. I did this, binding the EFS required rules to a security group that will be use by my instance:
resource "aws_security_group" "nfs01" {
vpc_id = "${data.terraform_remote_state.vpc01.vpc}"
tags = "${var.tag}"
ingress {
from_port = 2049
to_port = 2049
protocol = "tcp"
security_groups = [ "${aws_security_group.sg01.id}" ]
}
}
Back to EFS, here is how I define an EFS file system and a mount target for each of my two subnets:
resource "aws_efs_file_system" "neil" {
creation_token = "neil-home"
tags = "${var.tag}"
}
resource "aws_efs_mount_target" "neil01" {
file_system_id = "${aws_efs_file_system.neil.id}"
subnet_id = "${data.terraform_remote_state.vpc01.subnet01_id}"
security_groups = [ "${aws_security_group.nfs01.id}" ]
}
resource "aws_efs_mount_target" "neil02" {
file_system_id = "${aws_efs_file_system.neil.id}"
subnet_id = "${data.terraform_remote_state.vpc01.subnet02_id}"
security_groups = [ "${aws_security_group.nfs01.id}" ]
}
Code for ASG
The autoscaling group or ASG, defines what my instance should look like, including it's size (the default), what SSH key to use, security groups to assign, VPC subnets, and the user data to provision it. Note that I use a Terraform Module and it's details are here.
module "asg" {
asg_name = "jumpbox"
min_size = 0
desired_capacity = 1
source = "github.com/neilhwatson/terraform-modules//aws/ec2/asg"
ami_id = "${module.ami01.image_id}"
ssh_key = "luna"
instance_profile = "${aws_iam_instance_profile.route53_upsert.id}"
security_groups = [ "${aws_security_group.sg01.id}" ]
user_data = "${data.template_file.user-data.rendered}"
tag = "${var.tag}"
associate_public_ip = "true"
vpc_zone_ids = [ "${data.terraform_remote_state.vpc01.subnet01_id}"
, "${data.terraform_remote_state.vpc01.subnet02_id}" ]
}
The magic is in the minimum size and the desired_capacity. It means that there should be between 0 and 1 instances and that 1 is the ideal. This ensures that there will always be one instance, never more than one, and if for any reason an instance dies, another one will be created.
User data
User data is AWS speak for a basic provisioning script. Mine is quite large, but hears the key points:
Its a template
Terraform has a template feature. In this case my user-data.sh is template and I instruct Terraform to render it, replacing the given vars. In this case subnets and the EFS target DNS names.
data "template_file" "user-data" {
template = "${file("user_data.sh")}"
vars {
subnet01 = "${data.terraform_remote_state.vpc01.subnet01_id}"
subnet02 = "${data.terraform_remote_state.vpc01.subnet02_id}"
nfstarget01 = "${aws_efs_mount_target.neil01.dns_name}"
nfstarget02 = "${aws_efs_mount_target.neil01.dns_name}"
}
}
Earlier in my ASG setup, I passed this parameter, the user data template:
user_data = "${data.template_file.user-data.rendered}"
EFS
Functions to mount EFS and put the mount in fstab:
function mount_nfs() {
host=$1
mkdir /home/neil
mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $${host}:/ /home/neil
}
function add_mount_to_fstab() {
host=$1
echo "$${host}:/ /home/neil nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0" >> /etc/fstab
}
Each EFS mount target has a different hostname. Dig into the script and you'll see how I figure out which EFS target to point to.
DNS
If the instance re-spawns, and with a different IP address, how can I ensure it's DNS records are current? An elastic IP is an obvious choice, but I can do it cheaper. I configured my VPC to automatically assign public IPs to hosts. Now I just need to get that IP? The instance metadata will provide, and once I know that I use cli53 to set or change my A and AAAA records for the new IP address.
The script
This is a Terraform template of the script ${} variables will be substituted by Terraform. $${} variables are normal shell variables.
#!/bin/bash -x
set -e
echo "########### Starting user data ##########"
A_RECORD=orion
DOMAIN=watson-wilson.ca.
cli53=https://github.com/barnybug/cli53/releases/download/0.8.12/cli53-linux-amd64
meta_host=http://169.254.169.254/latest/meta-data
mac_addr=$(curl -s $${meta_host}/network/interfaces/macs/)
subnet01=${subnet01}
subnet02=${subnet02}
nfstarget01=${nfstarget01}
nfstarget02=${nfstarget02}
function get_ipv4() {
ipv4=$(curl -s $${meta_host}/public-ipv4)
echo $ipv4
}
function get_ipv6() {
ipv6=$(curl -s $${meta_host}/network/interfaces/macs/$${mac_addr}ipv6s)
echo $ipv6
}
function get_subnet_id() {
id=$(curl -s $${meta_host}/network/interfaces/macs/$${mac_addr}/subnet-id)
echo $id
}
function mount_nfs() {
host=$1
mkdir /home/neil
mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 $${host}:/ /home/neil
}
function add_mount_to_fstab() {
host=$1
echo "$${host}:/ /home/neil nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,_netdev 0 0" >> /etc/fstab
}
# For buggy ubuntu ipv6 config
echo "iface eth0 inet6 dhcp" >> /etc/network/interfaces.d/60-default-with-ipv6.cfg
dhclient -6
add-apt-repository -y ppa:ansible/ansible
apt-get -y update
DEBIAN_FRONTEND=noninteractive apt-get -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold" upgrade
apt-get -y install curl git ansible awscli nfs-common
#
# NFS mount
#
subnetid=$(get_subnet_id)
if [[ $subnetid = $subnet01 ]]
then
nfshost=$nfstarget01
elif [[ $subnetid = $subnet02 ]]
then
nfshost=$nfstarget02
else
echo "ERROR cannot find subnet and nfs target"
exit 1
fi
mount_nfs $nfshost
add_mount_to_fstab $nfshost
#
# Set this host's IP address to the desired DNS name.
#
curl -sLo /usr/local/bin/cli53 $cli53
chmod 755 /usr/local/bin/cli53
hostname=$(curl -s $${meta_host}/public-hostname)
ipv4=$(get_ipv4)
cli53 rrcreate --replace $DOMAIN "$A_RECORD 60 A $ipv4"
ipv6=$(get_ipv6)
cli53 rrcreate --replace $DOMAIN "$A_RECORD 60 AAAA $ipv6"
#
# Setup access to provisioning repo
#
cat <<'END_PULL' > /etc/cron.hourly/ansible-pull
#!/bin/bash
ansible-pull -f -U https://git-codecommit.ca-central-1.amazonaws.com/v1/repos/instance-provisioner jump-box.yml
END_PULL
chmod 755 /etc/cron.hourly/ansible-pull
export HOME="/root"
cd $HOME
git config --global credential.helper '!aws codecommit credential-helper $@'
git config --global credential.UseHttpPath true
aws configure set region ca-central-1
/etc/cron.hourly/ansible-pull
ansible-pull -U https://git-codecommit.ca-central-1.amazonaws.com/v1/repos/instance-provisioner cfbot.yml