By default, when managing a large number of EC2 instances, you don't get a lot of visibility into how your instances are behaving. Any monitoring beyond what you get in the console needs to be configured and set up by the AWS user using services like Cloudwatch or Cloudtrail, etc.

EC2 instances can be started or stopped for various reasons by anyone with access to the EC2 console, and your team might want to know about it.

In Zuar's case we manage hundreds of EC2 instances across three regions running our products Mitto, Rapid Portal and Custom Portal. We wanted to get notifications in our #devops slack channel anytime an instance state changed.

If you're familiar with Terraform feel free to skip everything below! Simply Edit and use this code.

Overview

AWS Cloudwatch allows you to create rules which will publish to an SNS topic any time a specific EC2 instance or list of instances has a state change. You can subscribe emails to this SNS topic and receive emails anytime there's a new SNS publication. In this case, we'll subscribe a custom Lambda function, that formats the slack message, and then makes a POST request to our slack channel.

Flow

  1. A user on your AWS account stops an EC2 instance.
  2. Your Cloudwatch Event Rule is triggered and publishes to your SNS Topic
  3. SNS invokes your Lambda function with the message Cloudwatch published
  4. Your Lambda function formats the message for slack and does a POST request to your webhook.

Results

The  messages in your slack channel should look something like this:

Slack Webhook

First and foremost, you need a Slack channel with an incoming webhook. It should give you a link similar to this which we will use later in our Lambda function: https://hooks.slack.com/services/UNIQUE_ID/UNIQUE_ID/UNIQUE_ID .. You can test that the link is working using curl like this:

curl -X POST -H 'Content-type: application/json' --data '{"text":"Hello, World!"}' https://hooks.slack.com/services/UNIQUE_ID/UNIQUE_ID/UNIQUE_ID

Lambda Function

I won't get into the specifics of Lambda deployment here. It'll need an execution role, a policy, and a policy attachment, some of which the console will create for you. Later you will add a trigger for SNS. Our Lambda uses Python3.6 and there are 4 environment variables you will need to set. ACCESSKEY, SECRETKEY, REGION and SLACK_HOOK. Below is our lambda function:

import json
import os
from datetime import datetime, timedelta
from dateutil import tz, parser
import boto3
import requests

"""get info from sns and post to slack"""

ACCESSKEY = os.environ['ACCESS_ID']
SECRETKEY = os.environ['ACCESS_KEY']
REGION = os.environ['REGION']
SLACK_HOOK = os.environ['SLACK_HOOK']

def get_instance_name(fid):
    """When given an instance ID as str e.g. 'i-1234567', return the instance 'Name' from the name tag."""
    ec2 = boto3.resource('ec2', aws_access_key_id=ACCESSKEY, aws_secret_access_key=SECRETKEY, region_name=REGION)
    ec2instance = ec2.Instance(fid)
    instancename = ''
    for tags in ec2instance.tags:
        if tags["Key"] == 'Name':
            instancename = tags["Value"]
    return instancename

def message_to_dict(message):
    """convert message to dict"""
    message = message.replace("\"", "")
    mdict = {}
    for word in message.split(", "):
        key, value = word.split("=")
        mdict[key] = value
    return mdict

def handler(event, context):
    message = False
    try:
        message = event["Records"][0]["Sns"]["Message"]
    except Exception as e:
        print(e)
        exit()
    if "pending" not in message and "stopping" not in message:
        url = SLACK_HOOK
        mdict = message_to_dict(message)
        d = parser.parse(mdict["time"])
        d = d - timedelta(hours=5)
        human_time = d.strftime('%m/%d/%Y %H:%M:%S')
        instance_name = get_instance_name(mdict["instance_id"])
        message = f'*Instance State Change:* - {human_time} - on {mdict["region"]} Instance {mdict["instance_id"]} was changed to *{mdict["state"]}*.\nInstance Tag Name: *{instance_name}*'
        myobj = {'text': message}
        try:
            x = requests.post(url, json=myobj)
        except Exception as e:
            print(e)
            exit()
        return {
            'statusCode': 200,
            'body': json.dumps({'response':x.text}),
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
        }
    else:
        return {
            'statusCode': 200,
            'body': json.dumps({'response':'pass'}),
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
        }

NOTE: When packaging this for deployment, be sure to include requests as a dependency.

Simple Notification Service (SNS)

Still with me? Next you'll need to create an SNS topic, and then create a subscription with your Lambda function. In the SNS console you would click "Create Topic". All you need here is a name. Once that's done, you'll go to "Subscriptions" and create a subscription. enter the ARN for the topic you just created, select "AWS Lambda" as the protocol, and then select the ARN for the Lambda we created.

Cloudwatch Rule

In the Cloudwatch console. Click on "Rules" under events in the menu on the left. Click on "Create Rule". On the left select "Event Pattern", service "EC2", event type "EC2 Instance State-change Notification". Select "Any state" and "Any Instance" to publish to SNS on changes to any instance in this region.

On the right select "SNS Topic" at the top then select the SNS topic you created. Below that select "input transformer". In the first textarea use:

{"instance-id":"$.detail.instance-id","state":"$.detail.state","time":"$.time","region":"$.region","account":"$.account"}

and in the Second textarea use:

"instance_id=<instance-id>, time=<time>, region=<region>, state=<state>"

Our Lambda function will expect the format above.

Save your changes, and it's time to do some testing.

Testing and Troubleshooting

To test everything we just created, simply go to the EC2 console and start and stop an instance you're not using in the region you created your resources in. If you don't get a message in your slack channel, check the cloudwatch logs for your Lambda function and ensure that the function was invoked. If the function was not invoked, you might need an SNS trigger on your Lambda function. Also double check you have all the permissions, roles, and policies you need for Lambda to receive events and be invoked by SNS.