Good day everyone,
I’ve been trying to complete this lab but for some reason, even though I’m following the official documentation from AWS, I’m unable to do so.
Lab description
The Nautilus DevOps team has been tasked with setting up an EC2 instance for their application. To ensure the application performs optimally, they also need to create a CloudWatch alarm to monitor the instance’s CPU utilization. The alarm should trigger if the CPU utilization exceeds 90% for one consecutive 5-minute period. To send notifications, use the SNS topic named datacenter-sns-topic which is already created.
- Launch EC2 Instance: Create an EC2 instance named datacenter-ec2 using any appropriate Ubuntu AMI.
- Create CloudWatch Alarm: Create a CloudWatch alarm named datacenter-alarm with the following specifications:
- Statistic: Average
- Metric: CPU Utilization
- Threshold: >= 90% for 1 consecutive 5-minute period.
- Alarm Actions: Send a notification to datacenter-sns-topic.
Initial Steps
I created an EC2 instance with an Ubuntu AMI.
- Added new kp
- Added default sc
Generated CLI for review
aws ec2 run-instances --image-id “ami-04b4f1a9cf54c11d0” --instance-type “t2.micro” --key-name “datacenter-ec2-kp” --block-device-mappings ‘{“DeviceName”:“/dev/sda1”,“Ebs”:{“Encrypted”:false,“DeleteOnTermination”:true,“Iops”:3000,“SnapshotId”:“snap-00cdccb3239896f89”,“VolumeSize”:8,“VolumeType”:“gp3”,“Throughput”:125}}’ --network-interfaces ‘{“AssociatePublicIpAddress”:true,“DeviceIndex”:0,“Groups”:[“sg-0df8afedb8dfeb7e0”]}’ --credit-specification ‘{“CpuCredits”:“standard”}’ --tag-specifications ‘{“ResourceType”:“instance”,“Tags”:[{“Key”:“Name”,“Value”:“datacenter-ec2”}]}’ --metadata-options ‘{“HttpEndpoint”:“enabled”,“HttpPutResponseHopLimit”:2,“HttpTokens”:“required”}’ --private-dns-name-options ‘{“HostnameType”:“ip-name”,“EnableResourceNameDnsARecord”:true,“EnableResourceNameDnsAAAARecord”:false}’ --count “1”
First attempt to set up the Alarm
Documentation Followed: “Create a CloudWatch alarm for an instance - Amazon Elastic Compute Cloud”
Current Output: Failed to Fetch
Desired Output: Alarm successfully created
Steps done:
-
Select the instance > Actions, Monitor and troubleshoot > Manage CloudWatch alarms.
-
Alarm Notification > datacenter-sns-topic
-
Alarm Thresholds:
-
Group Samples by: Average
-
Metric: CPU Utilization
-
Percent: 0.9
-
Period: 5 minutes
-
Consecutive Period: 1
-
Alarm name: datacenter-alarm
Second Attempt
At this point I surmise that something is off with the lab setup so inquired on the background requirements to create a CloudWatch Alarm for the instance.
Is the CloudWatch agent installed in the instance?
- No, in Ubuntu, the CloudWatch agent needs to be manually downloaded as a deb package via wget. Install and run the CloudWatch agent on your servers - Amazon CloudWatch
Prerequisites:
ssh into the instance:
$ touch datacenter-ec2-kp.pem
- I copied the private key I downloaded when I created the key pair of the instance in datacenter-ec2-kp.pem
$ chmod 400 “datacenter-ec2-kp.pem”
$ ssh -i “datacenter-ec2-kp.pem” [email protected],com
Is the SSM Agent installed in the instance?.
$ sudo systemctl status snap.amazon-ssm-agent.amazon-ssm-agent.service
- Running with errors
$ sudo cat /var/log/amazon/ssm/errors.log
2025-01-16 09:46:12.9001 ERROR [RemoteRetrieve @ ec2_role_provider.go.144] EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 590183746102
status code: 400, request id: ad5d066b-8632-4bcd-a513-6c7bf1c36ef0
2025-01-16 09:46:12.9001 ERROR [minLog @ credentialrefresher.go.280] [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity. Default Host Management Err: error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 590183746102
status code: 400, request id: ad5d066b-8632-4bcd-a513-6c7bf1c36ef0
2025-01-16 10:15:55.0275 ERROR [RemoteRetrieve @ ec2_role_provider.go.144] EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 590183746102
status code: 400, request id: 045d3096-6fb8-4c5a-86b7-3df376e3b9f8
2025-01-16 10:15:55.0275 ERROR [minLog @ credentialrefresher.go.280] [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity. Default Host Management Err: error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 590183746102
status code: 400, request id: 045d3096-6fb8-4c5a-86b7-3df376e3b9f8
This seems to point at an IAM role issue in my opinion.
Does the instance have the role with the policies CloudWatchAgentServerPolicy and AmazonSSMManagedInstanceCore attached?
- No, the instance has no IAM role attached.
Eventually, I try to manually create an IAM role but it fails somehow.
And on top of it, even though it’s on the IAM Role list, it does not show when selecting a role in the instance options.
All in all,
Current Output: Role is not correctly created due to insufficient permissions and does not show up in the dropdown button within the “Modify IAM Role” of the instance.
Desired Output: IAM Role should be correctly created and attached to the instance.
Can you help me shed some light on how to solve this issue?
Kind Regards.
Additional documentation links I’ve used:
- Checking SSM Agent status and starting the agent - AWS Systems Manager
- Create IAM roles and users for use with the CloudWatch agent - Amazon CloudWatch
- Install and run the CloudWatch agent on your servers - Amazon CloudWatch