In the last blog post, I showed how to setup Prometheus to monitor my lab environment. I used node exporters on my virtual machines to expose metrics that Prometheus would scrape!

What if you're using a cloud provider? How would you get metrics from an EC2 instance in AWS? Also what happens if my instances are ephemeral, meaning they get blown away and rebuilt by the autoscaling policies as needed? Today, I'll show you how to use Prometheus to scrape metrics on your EC2 instances!


Overview

This post will be referencing lots of details from the previous blog post, I strongly urge you to read it before continuing.

Let's do a quick review! In the previous post, the Prometheus server is located in my home network. It is only monitoring resources found in my LAN. However Prometheus has built in features that I can leverage to monitor AWS.

prometheus


Preparing AWS to be scraped

The first thing we need to do is create a user for Prometheus and assign that user an access key.

  1. Creating an IAM user for Prometheus

Log into your AWS console and then go to IAM.

2. Prometheus Group

Under Groups create a new group, I named mine Prometheus.

3. Attach the ReadOnlyAccess policy to the new Prometheus group

4. Prometheus User

Under Users create a new user, I named mine prometheus. Give this user "Programmatic access" only. Once the user is created, assign this user to the Prometheus group.

5. Create access key

Create a new access key for this user. Save the details of the access key, we'll need it for later.

After you've created the user, it's time to create the security group to allow access to port 9100 for Prometheus.

  1. Creating the security group

sg

This security group should only allows access to port 9100 from my public IP. If you leave this open to the public, anyone can read the metrics. If you have access to a VPN, it is highly recommended that you utilize it for this setup.

2. Assign the prometheus security group to your EC2 instances.


Configuring Prometheus

  1. SSH to the Prometheus server
  2. Open up the Prometheus configuration file
vim /appl/alertmanager-0.19.0.linux-amd64/prometheus.yml

3. Edit the Prometheus configuration file

The original config file for Prometheus shows two jobs. One for Prometheus itself and the other for the virtualization server – borg.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "/appl/rules/alert.rules"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'borg'
    static_configs:
      - targets: ['192.168.12.1:9100']     
original prometheus.yml

Now I'm going to make my edits at the bottom and create a new job for Prometheus to scrape. You will need to add the access and secret keys for Prometheus to use.

I'll be using a special configuration for Prometheus called ec2_sd_configs, this will allow Prometheus to interact with AWS!

Lastly I'll also be relabeling host as they are discovered by Prometheus. Instead of discovering them by private IP, I'll instead use the public IP. Also I'll limit the region to scrape to only US-WEST-1, when my projects get larger I will need to edit this entry and accommodate other regions.

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 192.168.122.74:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - '/appl/rules/alert.rules'
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'grafana'
    static_configs:
      - targets: ['192.168.122.132:9100']

  - job_name: 'borg'
    static_configs:
      - targets: ['192.168.122.1:9100']

  - job_name: 'awsec2'
    scrape_interval: 1m
    ec2_sd_configs:
      - region: us-west-1
        access_key: ADD_YOUR_ACCESS_KEY
        secret_key: ADD_YOUR_SECRET_KEY
        port: 9100
    relabel_configs:
      - source_labels: [__meta_ec2_public_ip]
        regex: '(.*)'
        target_label: __address__
        replacement: '${1}:9100'
new prometheus.yml

4. Restart the Prometheus service

I'm running Prometheus in containers. When you restart the service check to make sure new containers are up and running.

systemctl restart prometheus
command to restart prometheus
docker ps 
command to check if new containers were created

The containers that show up, should be a few seconds old, if new containers haven't been created you may have to manually delete the containers with a docker rm -f <container_ids> command.

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
693c02ffd297        prom/alertmanager   "/bin/alertmanager -…"   Up 4 seconds          Up 4 days           0.0.0.0:9093->9093/tcp   alertmanager
321781fc0413        prom/prometheus     "/bin/prometheus --c…"   Up 4 seconds          Up 4 days           0.0.0.0:9090->9090/tcp   prometheus
example docker ps output

Verify Prometheus sees the EC2 Instances

Time to check if Prometheus can find the EC2 instance in US-WEST-1. As of this moment I only have an ECS instance running in that region so I'll only have one EC2 instance.

The list below shows the current targets I'm monitoring. Under  the ec2 section is where all of the detected EC2 instances will appear. The rest of the targets are in my LAN. I blurred out any public IPs, all of the targets in ec2 will have public IPs.

Great it's all looking as intended.

targets

Time to check the what metrics Prometheus can see. While on the Prometheus UI, click on the "Graph" button. In the "Expression search bar" enter: node_load1{job='ec2'}.

data

You may not have as much data as mine, but you should see data starting to populate in Prometheus.

One minor take away before I leave, you will need to adjust the launch configuration or AMI so that the node exporter is installed when a new instance is created via autoscaling group. With this setup Prometheus can scrape instances as they are created!