In the last blog post, I showed how to setup Prometheus to monitor my lab environment. I used node exporters on my virtual machines to expose metrics that Prometheus would scrape!
What if you're using a cloud provider? How would you get metrics from an EC2 instance in AWS? Also what happens if my instances are ephemeral, meaning they get blown away and rebuilt by the autoscaling policies as needed? Today, I'll show you how to use Prometheus to scrape metrics on your EC2 instances!
Overview
This post will be referencing lots of details from the previous blog post, I strongly urge you to read it before continuing.
Let's do a quick review! In the previous post, the Prometheus server is located in my home network. It is only monitoring resources found in my LAN. However Prometheus has built in features that I can leverage to monitor AWS.
Preparing AWS to be scraped
The first thing we need to do is create a user for Prometheus and assign that user an access key.
- Creating an IAM user for Prometheus
Log into your AWS console and then go to IAM.
2. Prometheus Group
Under Groups
create a new group, I named mine Prometheus
.
3. Attach the ReadOnlyAccess
policy to the new Prometheus group
4. Prometheus User
Under Users
create a new user, I named mine prometheus
. Give this user "Programmatic access" only. Once the user is created, assign this user to the Prometheus
group.
5. Create access key
Create a new access key for this user. Save the details of the access key, we'll need it for later.
After you've created the user, it's time to create the security group to allow access to port 9100
for Prometheus.
- Creating the security group
This security group should only allows access to port 9100
from my public IP. If you leave this open to the public, anyone can read the metrics. If you have access to a VPN, it is highly recommended that you utilize it for this setup.
2. Assign the prometheus security group to your EC2 instances.
Configuring Prometheus
- SSH to the Prometheus server
- Open up the Prometheus configuration file
vim /appl/alertmanager-0.19.0.linux-amd64/prometheus.yml
3. Edit the Prometheus configuration file
The original config file for Prometheus shows two jobs. One for Prometheus itself and the other for the virtualization server – borg.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/appl/rules/alert.rules"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'borg'
static_configs:
- targets: ['192.168.12.1:9100']
Now I'm going to make my edits at the bottom and create a new job for Prometheus to scrape. You will need to add the access and secret keys for Prometheus to use.
I'll be using a special configuration for Prometheus called ec2_sd_configs
, this will allow Prometheus to interact with AWS!
Lastly I'll also be relabeling host as they are discovered by Prometheus. Instead of discovering them by private IP, I'll instead use the public IP. Also I'll limit the region to scrape to only US-WEST-1, when my projects get larger I will need to edit this entry and accommodate other regions.
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.122.74:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- '/appl/rules/alert.rules'
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'grafana'
static_configs:
- targets: ['192.168.122.132:9100']
- job_name: 'borg'
static_configs:
- targets: ['192.168.122.1:9100']
- job_name: 'awsec2'
scrape_interval: 1m
ec2_sd_configs:
- region: us-west-1
access_key: ADD_YOUR_ACCESS_KEY
secret_key: ADD_YOUR_SECRET_KEY
port: 9100
relabel_configs:
- source_labels: [__meta_ec2_public_ip]
regex: '(.*)'
target_label: __address__
replacement: '${1}:9100'
4. Restart the Prometheus service
I'm running Prometheus in containers. When you restart the service check to make sure new containers are up and running.
systemctl restart prometheus
docker ps
The containers that show up, should be a few seconds old, if new containers haven't been created you may have to manually delete the containers with a docker rm -f <container_ids>
command.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
693c02ffd297 prom/alertmanager "/bin/alertmanager -…" Up 4 seconds Up 4 days 0.0.0.0:9093->9093/tcp alertmanager
321781fc0413 prom/prometheus "/bin/prometheus --c…" Up 4 seconds Up 4 days 0.0.0.0:9090->9090/tcp prometheus
Verify Prometheus sees the EC2 Instances
Time to check if Prometheus can find the EC2 instance in US-WEST-1
. As of this moment I only have an ECS instance running in that region so I'll only have one EC2 instance.
The list below shows the current targets I'm monitoring. Under the ec2
section is where all of the detected EC2 instances will appear. The rest of the targets are in my LAN. I blurred out any public IPs, all of the targets in ec2 will have public IPs.
Great it's all looking as intended.
Time to check the what metrics Prometheus can see. While on the Prometheus UI, click on the "Graph" button. In the "Expression search bar" enter: node_load1{job='ec2'}
.
You may not have as much data as mine, but you should see data starting to populate in Prometheus.
One minor take away before I leave, you will need to adjust the launch configuration or AMI so that the node exporter is installed when a new instance is created via autoscaling group. With this setup Prometheus can scrape instances as they are created!