Kraken

Chaos and resiliency testing tool for Kubernetes and OpenShift

Kraken injects deliberate failures into Kubernetes/OpenShift clusters to check if it is resilient to failures.

Workflow

Install the dependencies

$ pip3 install -r requirements.txt

Usage

Config

Set the scenarios to inject and the tunings like duration to wait between each scenario in the config file located at config/config.yaml. Kraken uses powerfulseal tool for pod based scenarios, a sample config looks like:

kraken:
    kubeconfig_path: /root/.kube/config                    # Path to kubeconfig
    scenarios:                                             # List of policies/chaos scenarios to load
        -    scenarios/etcd.yml
        -    scenarios/openshift-kube-apiserver.yml                           
        -    scenarios/openshift-apiserver.yml
    node_scenarios:                                        # List of chaos node scenarios to load
        -    scenarios/node_scenarios_example.yml

tunings:
    wait_duration: 60                                      # Duration to wait between each chaos scenario

Run

$ python3 run_kraken.py --config <config_file_location>

Run containerized version

Assuming that the latest docker ( 17.05 or greater with multi-build support ) is intalled on the host, run:

$ docker pull quay.io/openshift-scale/kraken:latest
$ docker run --name=kraken --net=host -v <path_to_kubeconfig>:/root/.kube/config -v <path_to_kraken_config>:/root/kraken/config/config.yaml -d quay.io/openshift-scale/kraken:latest
$ docker logs -f kraken

Similarly, podman can be used to achieve the same:

$ podman pull quay.io/openshift-scale/kraken
$ podman run --name=kraken --net=host -v <path_to_kubeconfig>:/root/.kube/config:Z -v <path_to_kraken_config>:/root/kraken/config/config.yaml:Z -d quay.io/openshift-scale/kraken:latest
$ podman logs -f kraken

If you want to build your own kraken image see here

Report

The report is generated in the run directory and it contains the information about each chaos scenario injection along with timestamps.

Cerberus to help with cluster health checks

Cerberus can be used to monitor the cluster under test and the aggregated go/no-go signal generated by it can be consumed by Kraken to determine pass/fail. This is to make sure the Kubernetes/OpenShift environments are healthy on a cluster level instead of just the targeted components level. It is highly recommended to turn on the Cerberus health check feature avaliable in Kraken after installing and setting up Cerberus. To do that, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the config file.

Kubernetes/OpenShift chaos scenarios supported

Kraken currently just supports pod and node based scenarios, we will be adding more soon.

Node chaos scenarios

Following node chaos scenarios are supported:

node_start_scenario: scenario to stop the node instance.
node_stop_scenario: scenario to stop the node instance.
node_stop_start_scenario: scenario to stop and then start the node instance.
node_termination_scenario: scenario to terminate the node instance.
node_reboot_scenario: scenario to reboot the node instance.
stop_kubelet_scenario: scenario to stop the kubelet of the node instance.
stop_start_kubelet_scenario: scenario to stop and start the kubelet of the node instance.
node_crash_scenario: scenario to crash the node instance.

NOTE: If the node doesn't recover from the node_crash_scenario injection, reboot the node to get it back to Ready state.

NOTE: node_start_scenario, node_stop_scenario, node_stop_start_scenario, node_termination_scenario, node_reboot_scenario and stop_start_kubelet_scenario are supported only on AWS as of now.

NOTE: With AWS as the cloud type, make sure AWS CLI is installed.

Node scenarios can be injected by placing the node scenarios config files under node_scenarios option in the kraken config. Refer to node_scenarios_example config file.

node_scenarios:
  - actions:                                                        # node chaos scenarios to be injected
    - node_stop_start_scenario
    - stop_start_kubelet_scenario
    - node_crash_scenario
    node_name:                                                      # node on which scenario has to be injected
    label_selector: node-role.kubernetes.io/worker                  # when node_name is not specified, a node with matching label_selector is selected for node chaos scenario injection
    instance_kill_count: 1                                          # number of times to inject each scenario under actions
    timeout: 120                                                    # duration to wait for completion of node scenario injection
    cloud_type: aws                                                 # cloud type on which Kubernetes/OpenShift runs
  - actions:
    - node_reboot_scenario
    node_name:
    label_selector: node-role.kubernetes.io/infra
    instance_kill_count: 1
    timeout: 120
    cloud_type: aws

Pod chaos scenarios

Following are the components of Kubernetes/OpenShift for which a basic chaos scenario config exists today. Adding a new pod based scenario is as simple as adding a new config under scenarios directory and defining it in the config.

Component	Description	Working
Etcd	Kills a single/multiple etcd replicas for the specified number of times in a loop	✔️
Kube ApiServer	Kills a single/multiple kube-apiserver replicas for the specified number of times in a loop	✔️
ApiServer	Kills a single/multiple apiserver replicas for the specified number of times in a loop	✔️
Prometheus	Kills a single/multiple prometheus replicas for the specified number of times in a loop	✔️

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CI/scenarios		CI/scenarios
ansible		ansible
config		config
containers		containers
kraken		kraken
media		media
scenarios		scenarios
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_kraken.py		run_kraken.py
setup.cfg		setup.cfg
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kraken

Workflow

Install the dependencies

Usage

Config

Run

Run containerized version

Report

Cerberus to help with cluster health checks

Kubernetes/OpenShift chaos scenarios supported

Node chaos scenarios

Pod chaos scenarios

About

Releases

Packages

Languages

License

yashashreesuresh/kraken

Folders and files

Latest commit

History

Repository files navigation

Kraken

Workflow

Install the dependencies

Usage

Config

Run

Run containerized version

Report

Cerberus to help with cluster health checks

Kubernetes/OpenShift chaos scenarios supported

Node chaos scenarios

Pod chaos scenarios

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages