Use Code TRYNOW15 for a One-Time, Extra 15% OFF at KodeKloud
AWS

Chaos Engineering

This course is designed to equip you with the knowledge and skills needed to ensure your AWS systems withstand and recover from failures.
Nasia Ullas
Resilience and Disaster Recovery Product Lead
DevOps Pre-Requisite Course
Play Button
Fill this form to get a notification when course is released.
book
10
Lessons
book
Challenges
Article icon
60
Topics

What you’ll learn

Our students work at..

Description

In today's fast-paced digital landscape, system resilience is vital for businesses of all sizes. "Chaos Engineering" is a comprehensive and hands-on course designed to equip you with the knowledge and skills needed to ensure your systems withstand and recover from failures. From foundational concepts to advanced application on various AWS services including EC2, Aurora, Fargate, and EKS, as well as strategies to ensure availability across multiple Availability Zones.

What You’ll Learn:

Chaos Engineering Fundamentals:

  • Understand core principles and the philosophy behind Chaos Engineering.
  • Learn why identifying and addressing system weaknesses through controlled chaos experiments is vital.
  • Explore essential tools and methodologies for implementing Chaos Engineering.

Building a Basic Fault Injection Simulation (FIS) Experiment:

  • Gain a step-by-step understanding of constructing and executing your first Fault Injection Simulation (FIS) experiment.
  • Understand how to design experiments targeting different failure modes in a controlled setting.
  • Learn to interpret experiment results and refine your simulations for better accuracy.

Introduction to Real-Life Application:

  • Discover how to apply Chaos Engineering experiments to real-world applications.
  • Learn best practices for monitoring, capturing metrics, and analyzing results to continually improve system resilience.

Chaos Engineering on Compute - EC2:

  • Conduct chaos experiments on EC2 instances to evaluate and improve system robustness.
  • Simulate failures, such as instance termination or network latency, and observe impacts.

Chaos Engineering on Database - Aurora:

  • Learn to apply Chaos Engineering principles to Amazon Aurora databases.
  • Simulate failures like cluster instability or node outages and develop strategies for seamless recovery.

Chaos Engineering on Serverless - Fargate:

  • Conduct chaos experiments on AWS Fargate to test the resilience of your serverless applications.
  • Simulate events like task failures or service downtime to ensure robust serverless architectures.

Chaos Engineering on Kubernetes - EKS:

  • Implement Chaos Engineering on Amazon EKS to stress-test Kubernetes clusters.
  • Simulate pod failures, node crashes, and other disruptions to validate recovery mechanisms.

Chaos Engineering on Availability Zone:

  • Conduct chaos experiments across different AWS Availability Zones.
  • Test the impact of zone failures and ensure your systems are prepared for multi-availability zone disasters.

Target Audience:

  • Developers interested in enhancing their systems’ resilience.
  • Site Reliability Engineers (SREs) focused on improving system reliability.
  • Cloud Engineers managing AWS environments.
  • Technical Support Engineers specializing in fault-tolerant systems.
  • Technical Leads overseeing cloud-native application projects.

This course, with its combination of theory, demonstrations, and real-world scenarios, will enable you to build resilient systems capable of withstanding and recovering from unexpected failures efficiently. Join us to master Chaos Engineering and innovate with confidence.

Read More

What our students say

About the instructor

Nasia is an Engineering Development and Integration Subject Matter Expert (SME) in Disaster Recovery (DR), Hybrid Cloud, Resilience and Business Continuity. Her expertise spans Cloud Computing, Disaster Recovery as a service, Infrastructure as Code, Data Replication and Archiving.

She excels in testing and integrating new technologies into existing infrastructures and develops robust technical solutions for Migration, Disaster Recovery, and Archiving tailored to meet business requirements. Her deep knowledge of multiple proprietary DR technologies is evident in her comprehensive Disaster Recovery testing, planning, and implementation efforts. Ensuring application resilience and business continuity is a primary focus for Nasia, driving her daily efforts and strategic initiatives.

No items found.

Introduction

lock
lock
2
Topics
Lesson Content

Module Content

Course Introduction
Important Course Resources

Chaos Engineering Fundamentals

lock
lock
5
Topics
Lesson Content

Module Content

Why Chaos Engineering?
What is Chaos Engineering?
What is AWS FIS?
FIS Experiments in this Course
Quiz - Chaos Engineering Fundamentals

Building a Basic FIS experiment

lock
lock
12
Topics
Lesson Content

Module Content

FIS Permissions
Demo - Create FIS Permissions
Experiment 1-Chaos Engineering on ASG
Built ASG based Architecture
Demo: ASG based Architecture
Create FIS Experiment
Demo - Create FIS Experiment
Demo - Run FIS Experiment
Demo - Learning and Improvements
Demo - FIS experiment -CloudWatch Dashboard
Demo - Create FIS Experiment using CF
Quiz - Building Basis FIS Experiment

Introduction to Real life Application

lock
lock
9
Topics
Lesson Content

Module Content

Introduction to Our Real Life Application
Pre-requisite to Deploy Application & Cloud 9 Deprecation
Demo: Pre-requisite to Deploy Application
Demo - Setup Architecture and Deploy Application
How to Plan Your Experiment? Part 1
How to Plan Your Experiment? Part 2
Establishing Steady State Metrics Using Cloudwatch RUM/X Ray
Demo: Cloud Formation Deployment
Quiz - Introduction to Real Life Application

Chaos Engineering on Compute - EC2

lock
lock
4
Topics
Lesson Content

Module Content

Disk Fill Scenario on EC2
Demo: FIS Experiment - Disk Fill Scenario on EC2 and before metrics in X Ray
Demo: FIS Experiment - After Metrics in X Ray and EC2 instances
Quiz - Chaos Engineering on Compute

Chaos Engineering on Database - Aurora

lock
lock
4
Topics
Lesson Content

Module Content

Reboot Reader Node Scenario on Aurora
Demo: Pre-requisite for FIS experiment, Create IAM role, and Current State
Demo: Create and Run FIS experiment and After Metrics and DB state
Quiz - Chaos Engineering on Database - Aurora

Chaos Engineering on Serverless - Fargate

lock
lock
6
Topics
Lesson Content

Module Content

ECS Fargate Experiment Idea and Hypothesis
Demo: Fargate Steady State
Demo: Fargate IAM role creation
Demo: Run experiment Task I/O stress
Demo: Fargate After State and Learning and Improvements
Quiz - Chaos Engineering on Serverless - Fargate

Chaos Engineering on Kubernetes- EKS

lock
lock
10
Topics
Lesson Content

Module Content

EKS Explanation
Demo: Memory Stress on EKS - Part 1
Demo: Memory Stress on EKS - Part 2
Demo: Memory Stress on EKS - Part 3
Demo: Memory Stress on EKS - Part 4
Pod Delete on EKS
Demo: Steady State Pod Delete on EKS
Demo: Run Experiment Pod Delete on EKS
Demo: Recheck After Pod Delete on EKS
Quiz - Chaos Engineering on Kubernetes - EKS

Chaos Engineering on Availability Zone

lock
lock
6
Topics
Lesson Content

Module Content

What is an Availability Zone (AZ)?
Experiment Overview
Demo: General Experiment Setup - AZ
Demo: Prepare Experiment - AZ
Running the Experiment
Quiz - Chaos Engineering on Availability Zone

Conclusion

lock
lock
2
Topics
Lesson Content

Module Content

Cleanup Process
Conclusion
Play Button
Fill this form to get a notification when course is released.
This course comes with hands-on cloud labs
book
10
Modules
book
Lessons
Article icon
60
Lessons
check mark
Course Certificate
Videos icon
02.59
Hours of Video
laptop
Hours of Labs
Story Format
Videos icon
Videos
Case Studies
ondemand_video icon
Demo
laptop
Labs
laptop
Cloud Labs
checklist
Mock exams
Quizzes
slack icon
Slack channel support
people icon
Community support
language icon
English
Closed Captions