Blue green deployment or sometimes referred to as red black deployment is a deployment strategy where a new environment (green) running the most up to date code base is created along side the current environment (blue). Then traffic is cut over from blue to green.

This type of deployment allows for minimal downtime and if any issues occur, the cut over can be quickly reversed. Once the green environment is running with no issues the blue environment can be destroyed and the whole process can be repeated with the next version of the code base.

Phase 1

Create a prod version 2 environment running in parallel with prod version 1.

Phase 2

Cutover to the prod version 2 environment by making changes in DNS to point to v2's load balancer in Route 53.


Prerequisites

I'll be using the same project I used when I created the elastic load balancer. So if you have destroyed your prod environment recreate it again. This will be the blue environment.

Create a new cloudformation template for the green environment. Below is what I'll be using.

Template for Version 2

# version 2019.10.04
AWSTemplateFormatVersion: '2010-09-09'

Mappings:
  AwsRegionAmi:
    us-east-1:
      AMI: ami-0b69ea66ff7391e80
    us-east-2:
      AMI: ami-00c03f7f7f2ec15c3

Resources:
  myInstanceProfile: 
    Type: AWS::IAM::InstanceProfile
    Properties: 
      Roles: 
      - EC2S3RO

  myInstanceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    DependsOn: myALBSecurityGroup
    Properties:
      GroupDescription: Allow SSH from my public IP and HTTP from ALB
      VpcId: vpc-008841edcbac65ca1
      SecurityGroupEgress:
      - IpProtocol: -1
        FromPort: -1
        ToPort: -1
        CidrIp: 0.0.0.0/0
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 80
        ToPort: 80
        SourceSecurityGroupId: !Ref myALBSecurityGroup
      - IpProtocol: tcp
        FromPort: 22
        ToPort: 22
        CidrIp: <YOUR_PUBLIC_IP>
      Tags:
      - Key: Name
        Value: prod-instance-sg-v2

  myALBSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow all traffic from ALB
      VpcId: vpc-008841edcbac65ca1
      SecurityGroupEgress:
      - IpProtocol: -1
        FromPort: -1
        ToPort: -1
        CidrIp: 0.0.0.0/0
      SecurityGroupIngress:
      - IpProtocol: tcp
        FromPort: 80
        ToPort: 80
        CidrIp: 0.0.0.0/0
      Tags:
      - Key: Name
        Value: prod-alb-sg-v2

  myLaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      LaunchConfigurationName: cf-created-standard-prod-v2
      KeyName: cf
      SecurityGroups: 
      - !Ref myInstanceSecurityGroup
      InstanceType: t2.micro
      ImageId: !FindInMap [AwsRegionAmi, !Ref 'AWS::Region', AMI]
      BlockDeviceMappings: 
      - DeviceName: "/dev/xvda"
        Ebs:
          VolumeSize: 10
          VolumeType: gp2
      IamInstanceProfile: !Ref myInstanceProfile
      UserData:
        Fn::Base64: |
          #!/bin/bash
          timedatectl set-timezone "America/New_York"
          HOST=$(hostname)
          DATE=$(date +%Y-%m-%d)
          ARTIFACT=$(aws s3 ls s3://codepipeline-us-east-2-710251686107/hello-kitty-ASG/BuildArtif/ | grep ${DATE} | awk '{print $4}')
          yum update -y
          yum install epel-release vim stress lynx tcpdump tmux -y
          iptables -I INPUT -p tcp -m tcp --dport 80 -j ACCEPT
          amazon-linux-extras install epel -y
          yum install python-pip -y 
          pip install Flask
          echo "hello lol12345" > /tmp/lol.txt
          mkdir /flask
          aws s3 cp s3://codepipeline-us-east-2-710251686107/hello-kitty-ASG/BuildArtif/${ARTIFACT} /flask
          unzip /flask/${ARTIFACT} -d /flask
          echo "I am $HOST" >> /flask/templates/whoami.html
          python /flask/application.py

  myASG:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn: myTargetGroup
    Properties:
      AutoScalingGroupName: myCfAsg-prod-v2
      LaunchConfigurationName: !Ref myLaunchConfig
      DesiredCapacity: "1"
      MinSize: "1"
      MaxSize: "3"
      VPCZoneIdentifier:
      - subnet-04e6d0dfbf307f88e
      - subnet-06b2f68b2fe2d4524
      TargetGroupARNs: 
      - !Ref myTargetGroup
      Tags:
      - Key: Environment 
        Value: Prod
        PropagateAtLaunch: "true"
      - Key: Name 
        Value: ProdV2Instance
        PropagateAtLaunch: "true"
  
  myTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckEnabled: true
      HealthCheckIntervalSeconds: 30
      HealthCheckPath: "/"
      HealthCheckPort: 80
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 3
      Matcher:
       HttpCode: 200-299
      Name: tg-dev
      Port: 80
      Protocol: HTTP
      VpcId: vpc-008841edcbac65ca1
      Tags:
      - Key: Environment
        Value: Prod
      - Key: Name
        Value: tg-prod-v2

  myS3bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: devs3bucket0021

  myALB:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      IpAddressType: ipv4
      Name: prod-v2-alb
      Scheme: internet-facing
      SecurityGroups: 
      - !Ref myALBSecurityGroup
      Subnets:
      - subnet-04e6d0dfbf307f88e
      - subnet-06b2f68b2fe2d4524
      Type: application
      Tags:
      - Key: Name
        Value: prod-v2-alb 

  myListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      DefaultActions:
      - Type: "forward"
        TargetGroupArn: !Ref myTargetGroup
      LoadBalancerArn: !Ref myALB
      Port: 80
      Protocol: HTTP

  ScaleUpPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AdjustmentType: ChangeInCapacity
      AutoScalingGroupName: !Ref myASG
      Cooldown: '100'
      ScalingAdjustment: 1

  ScaleDownPolicy:
    Type: AWS::AutoScaling::ScalingPolicy
    Properties:
      AdjustmentType: ChangeInCapacity
      AutoScalingGroupName: !Ref myASG
      Cooldown: '100'
      ScalingAdjustment: -1

In the myInstanceSecurityGroup resource add your public IP to be able to SSH into your instances!

Pay close attention to the launch configuration (myLaunchConfig). I added several lines in the "UserData" section. I changed the timezone clock on the instances to use EDT. I learned the hard way that UTC time breaks my user-data script. I also echoed the hostname to a file called whoami.html. When someone access http://my_domain/whoami they should see in their browser "I am <hostname-of-instance>".

Along with that change I modified the index.html in the teplates directory for prod-v2. I changed the Hello Kitty to Doge by replacing the link on the page.

Let's verify the environments are up by accessing them by using the DNS entry of the ALB. I added another page in my flask website called /whoami. This page will return the hostname of the EC2 instance.

Below are the changes, I'll do to the application.py file to accommodate the new route (/whoami). I also made a change to the application.run() and set the port to 80 instead of the default flask port.

Application.py

from flask import Flask, render_template

application = Flask(__name__)

@application.route('/')
def index():
    return render_template('index.html')

@application.route('/status')
def status():
    return "green"

@application.route('/whoami')
def whoami():
    return render_template('whoami.html')
    
if __name__ == '__main__':
    application.run(host='0.0.0.0', port=80)

Version 1:

Version 2:

Perfect both server 1 and 2 are up and running!


Making changes in Route 53

Now that both environments are up and running, I'm going to make the following edits to Route 53.

First I'm going to create a CNAME record for "www.arandomproject.net" and put the DNS entry of version 1. Then I'll change the routing policy to "Weighted" and give it a weight of 90.

I'll create a another CNAME record for "www.arandomproject.net" put the DNS entry of version 2 as the value. Then I'll change to a weighted routing policy and set the weight to 10.

You maybe wondering why I set the routing like this? You may also be thinking that I should have routed 100% of the traffic to version 2. The reason I did this is to verify that version 2 is in a working state for a small portion of users before allowing more users to access the new version of the site.

Once everything has been verified, I'll gradually increment the percentage on the weighted routes until 100% of the traffic is redirected to version 2. This type of deployment is also known as canary deployment because it tests the final product on a small subset before full usage.

In between changing the weights you should verify the site using the URL in my case my test site is http://arandomproject.net.

You should first see version 1 and Hello Kitty then as the weight increases on version 2 you will be greeted by doge.

Version 1:

Version 2:


Conclusion

The whole purpose of blue green deployment is to minimize down time. As you can see both the old and new environments are running in parallel and when it came to cutover the URL to the new version there was virtually no downtime. That is the advantage of using blue green deployment.