Skip to content

Chaos and resiliency testing tool for Kubernetes and OpenShift

License

Notifications You must be signed in to change notification settings

yashashreesuresh/kraken

 
 

Repository files navigation

Kraken

Chaos and resiliency testing tool for Kubernetes and OpenShift

Kraken injects deliberate failures into Kubernetes/OpenShift clusters to check if it is resilient to failures.

Workflow

Kraken workflow

Install the dependencies

$ pip3 install -r requirements.txt

Usage

Config

Set the scenarios to inject and the tunings like duration to wait between each scenario in the config file located at config/config.yaml. Kraken uses powerfulseal tool for pod based scenarios, a sample config looks like:

kraken:
    kubeconfig_path: /root/.kube/config                    # Path to kubeconfig
    scenarios:                                             # List of policies/chaos scenarios to load
        -    scenarios/etcd.yml
        -    scenarios/openshift-kube-apiserver.yml                           
        -    scenarios/openshift-apiserver.yml
    node_scenarios:                                        # List of chaos node scenarios to load
        -    scenarios/node_scenarios_example.yml

tunings:
    wait_duration: 60                                      # Duration to wait between each chaos scenario 

Run

$ python3 run_kraken.py --config <config_file_location>

Run containerized version

Assuming that the latest docker ( 17.05 or greater with multi-build support ) is intalled on the host, run:

$ docker pull quay.io/openshift-scale/kraken:latest
$ docker run --name=kraken --net=host -v <path_to_kubeconfig>:/root/.kube/config -v <path_to_kraken_config>:/root/kraken/config/config.yaml -d quay.io/openshift-scale/kraken:latest
$ docker logs -f kraken

Similarly, podman can be used to achieve the same:

$ podman pull quay.io/openshift-scale/kraken
$ podman run --name=kraken --net=host -v <path_to_kubeconfig>:/root/.kube/config:Z -v <path_to_kraken_config>:/root/kraken/config/config.yaml:Z -d quay.io/openshift-scale/kraken:latest
$ podman logs -f kraken

If you want to build your own kraken image see here

Report

The report is generated in the run directory and it contains the information about each chaos scenario injection along with timestamps.

Cerberus to help with cluster health checks

Cerberus can be used to monitor the cluster under test and the aggregated go/no-go signal generated by it can be consumed by Kraken to determine pass/fail. This is to make sure the Kubernetes/OpenShift environments are healthy on a cluster level instead of just the targeted components level. It is highly recommended to turn on the Cerberus health check feature avaliable in Kraken after installing and setting up Cerberus. To do that, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the config file.

Kubernetes/OpenShift chaos scenarios supported

Kraken currently just supports pod and node based scenarios, we will be adding more soon.

Node chaos scenarios

Following node chaos scenarios are supported:

  1. node_start_scenario: scenario to stop the node instance.
  2. node_stop_scenario: scenario to stop the node instance.
  3. node_stop_start_scenario: scenario to stop and then start the node instance.
  4. node_termination_scenario: scenario to terminate the node instance.
  5. node_reboot_scenario: scenario to reboot the node instance.
  6. stop_kubelet_scenario: scenario to stop the kubelet of the node instance.
  7. stop_start_kubelet_scenario: scenario to stop and start the kubelet of the node instance.
  8. node_crash_scenario: scenario to crash the node instance.

NOTE: If the node doesn't recover from the node_crash_scenario injection, reboot the node to get it back to Ready state.

NOTE: node_start_scenario, node_stop_scenario, node_stop_start_scenario, node_termination_scenario, node_reboot_scenario and stop_start_kubelet_scenario are supported only on AWS as of now.

NOTE: With AWS as the cloud type, make sure AWS CLI is installed.

Node scenarios can be injected by placing the node scenarios config files under node_scenarios option in the kraken config. Refer to node_scenarios_example config file.

node_scenarios:
  - actions:                                                        # node chaos scenarios to be injected
    - node_stop_start_scenario
    - stop_start_kubelet_scenario
    - node_crash_scenario
    node_name:                                                      # node on which scenario has to be injected
    label_selector: node-role.kubernetes.io/worker                  # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection
    instance_kill_count: 1                                          # number of times to inject each scenario under actions
    timeout: 120                                                    # duration to wait for completion of node scenario injection
    cloud_type: aws                                                 # cloud type on which Kubernetes/OpenShift runs
  - actions:
    - node_reboot_scenario
    node_name:
    label_selector: node-role.kubernetes.io/infra
    instance_kill_count: 1
    timeout: 120
    cloud_type: aws

Pod chaos scenarios

Following are the components of Kubernetes/OpenShift for which a basic chaos scenario config exists today. Adding a new pod based scenario is as simple as adding a new config under scenarios directory and defining it in the config.

Component Description Working
Etcd Kills a single/multiple etcd replicas for the specified number of times in a loop ✔️
Kube ApiServer Kills a single/multiple kube-apiserver replicas for the specified number of times in a loop ✔️
ApiServer Kills a single/multiple apiserver replicas for the specified number of times in a loop ✔️
Prometheus Kills a single/multiple prometheus replicas for the specified number of times in a loop ✔️

About

Chaos and resiliency testing tool for Kubernetes and OpenShift

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.2%
  • HTML 2.5%
  • Dockerfile 1.7%
  • Shell 1.6%