DevOps Guru EKS Test Harness

This project allows one to deploy an EKS cluster in their account and trigger various failure modes via a test client, in order to demonstrate functionality of DevOps Guru in a context of Kubernetes cluster.

Requirements

In order to operate this test harness you will need the following:

A PC with a unix-based opsystem (GNU/Linux or macOS) and a shell (bash, dash, zsh)
Onboard used account to AWS DevOps Guru in one of the supported regions.
Gradle
Python 3.6+ with 'pip' utility
Docker
kubectl
eksctl
AWS CLI V2 - only v2 is supported
Helm

Installing the harness

In order to provision the cluster and install all the necessary elements:

Authenticate into your AWS account using credentials that have mutating permissions.

aws configure

Run the bootstrap script in the root folder of the repository.

./bootstrap.sh

Inspecting the cluster

If you would like to inspect the content of deployed EKS cluster, start kubectl proxy via the script in the root of the repository

./start_proxy.sh

This will allow you to view:

In order to stop the proxy process, run

./stop_proxy.sh

In order to get access token for Kubernetes dashboard, run

./get_dashboard_token.sh

Running tests

Before running tests, please make sure that your cluster has been running for at least 60 minutes, to give DevOps Guru a chance to ingest and index all the metrics.

In order to run test cases, make sure you have Python 3.6+ interpreter installed and run:

./run_test.sh <test_name>

Currently supported tests scenarios:

alb_4xx - triggers a series of 4XX errors in test API, producing ApplicationELB HTTPCode_Target_4XX_Count Anomalous insights in DevOps Guru. Please keep in mind, that this can take up to 15-20 minutes to trigger.
alb_5xx triggers a series of 5XX errors in test API, producing ApplicationELB HTTPCode_Target_5XX_Count Anomalous insights in DevOps Guru. Please keep in mind, that this can take up to 15-20 minutes to trigger.
stop_instance - stops one of underlying EC2 instances in EKS node group, producing ContainerInsights cluster_failed_node_count Anomalous In Stack eksctl-DevOpsGuruTestCluster-cluster insight in DevOps Guru.
restart_instance - restarts all the underlying EC2 instances in EKS node group, ending the anomaly caused by stop_instance.
enable_cpu_stress_test - enables CPU stress test mode, which brings overall cluster CPU utilization to above 90%. After 30 minutes, this produces an anomaly, which does not produce a separate insight, but will be shown as a part of alb_5xx, alb_4xx and stop_instance insights. Before enabling this mode, make sure that the cluster has been running for at least 60 minutes to establish baseline for utilization.
disable_cpu_stress_test - disables CPU stress test mode mentioned in enable_cpu_stress_test
trigger_pod_crash - installs a misconfigured deployment that induces a rolling pod crash due to a failing probe to demonstrate pod_number_of_container_restarts insights
disable_pod_crash - restores normal deployment configuration after trigger_pod_crash

Anomalous metric values can be confirmed via CloudWatch console, and DevOps Guru produced anomalies can be seen in DevOps Guru console.

Cleaning up test resources

In order to clean up test harness resources from your account you can run:

./cleanup.sh

In case the cleanup script fails, you can attempt manual deletion of CloudFormation stack names eksctl-DevOpsGuruTestCluster-cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github		.github
aws_load_balancer_controller		aws_load_balancer_controller
chaos_mesh		chaos_mesh
cluster_bootstrap		cluster_bootstrap
devopsguru_eks_test		devopsguru_eks_test
ecr		ecr
kong		kong
kubernetes_dashboard		kubernetes_dashboard
onboard		onboard
prometheus		prometheus
redis		redis
test_client		test_client
test_scenarios		test_scenarios
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bootstrap.sh		bootstrap.sh
cleanup.sh		cleanup.sh
get_chaos_dashboard_token.sh		get_chaos_dashboard_token.sh
get_dashboard_token.sh		get_dashboard_token.sh
get_region.sh		get_region.sh
run_test.sh		run_test.sh
start_proxy.sh		start_proxy.sh
stop_proxy.sh		stop_proxy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DevOps Guru EKS Test Harness

Requirements

Installing the harness

Inspecting the cluster

Running tests

Cleaning up test resources

About

Releases

Packages

Contributors 6

Languages

License

aws-samples/aws-devopsguru-eks-test-harness

Folders and files

Latest commit

History

Repository files navigation

DevOps Guru EKS Test Harness

Requirements

Installing the harness

Inspecting the cluster

Running tests

Cleaning up test resources

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages