Welcome to the Kubernetes Troubleshooting Scenarios Simulator! This repository contains 35 real-world Kubernetes issues that you can simulate, analyze, and resolve. Whether you're a beginner or an experienced Kubernetes professional, this project will help you gain hands-on troubleshooting experience in Kubernetes environments.
- 35 real-world scenarios: Each scenario is designed to reflect actual Kubernetes issues you may encounter in production environments.
- Step-by-step troubleshooting: Follow clear instructions for simulating issues, investigating problems, and applying solutions.
- Hands-on practice: Learn by doing! Build your troubleshooting skills in a real Kubernetes environment.
- Docker integration: Learn how to build and run Docker containers for each scenario as needed.
- 35 Kubernetes troubleshooting scenarios covering networking, storage, security, performance, and more.
- Dockerfiles are provided for building containerized environments to simulate each scenario (only when needed).
- Automation script to build and push Docker images for all scenarios to DockerHub.
- Practical solutions and explanations to help you understand the root causes of issues.
- Comprehensive troubleshooting tips: Includes best practices and additional troubleshooting strategies for real-world Kubernetes environments.
-
Kubernetes Cluster:
- Ensure you have access to a Kubernetes cluster. You can use Minikube, Docker Desktop, or a cloud-managed service like AWS EKS, GKE, or AKS.
-
Install kubectl:
kubectl
is the Kubernetes command-line tool that you'll use to interact with your cluster.- Download and install it from the official Kubernetes site.
- To verify installation:
kubectl version --client
-
Install Docker:
- Install Docker to build images locally or on your preferred CI/CD pipeline.
- Download Docker from the official Docker site.
-
Clone the Repository:
- Clone the repository to your local machine:
git clone https://github.com/vellankikoti/troubleshoot-kubernetes-like-a-pro.git cd troubleshoot-kubernetes-like-a-pro
- Clone the repository to your local machine:
Docker images are used to simulate the scenarios, and the Dockerfiles are located in the dockerfiles/
folder, separate from the scenario folders. You don’t need to manually create an image for every scenario. Instead, you can use the provided automation script to build and push all images at once.
Before building and pushing images, make sure to configure your DockerHub username in the automation script. The default username is set to vellankikoti
, but you can easily replace it with your own DockerHub username.
- Open the automation script file:
scripts/build_and_push.sh
- Replace the username in the script:
DOCKER_USERNAME="vellankikoti" # Change this to your DockerHub username
Once the username is configured, you can run the automation script to build all Docker images and push them to DockerHub:
-
Run the script:
./scripts/build_and_push.sh
-
The script will automatically:
- Build the Docker images for the scenarios that require them (based on the Dockerfiles in
dockerfiles/
). - Tag the images with your DockerHub username.
- Push the images to your DockerHub account.
- Build the Docker images for the scenarios that require them (based on the Dockerfiles in
Once the images are pushed to DockerHub, you can run the containers to simulate the troubleshooting scenarios locally. To do this:
-
Pull the image from DockerHub:
docker pull <your-dockerhub-username>/<scenario-name>:latest
-
Run the container:
docker run -it --rm <your-dockerhub-username>/<scenario-name>:latest
For example, to run the "Affinity Rules Violation" scenario:
docker run -it --rm vellankikoti/affinity-rules-violation:latest
After running the container, you can remove the image (optional) to free up space:
docker rmi <your-dockerhub-username>/<scenario-name>:latest
If you have access to a Kubernetes cluster, you can run these scenarios directly within your cluster.
In the root directory of the repository, you will find a script to help manage the scenarios:
chmod +x manage-scenarios.sh
Start the scenario management script:
./manage-scenarios.sh
The script will display a list of 35 scenarios. Each scenario corresponds to a real-world issue in Kubernetes. Enter the scenario number to simulate and resolve the issue.
For example, enter 1
to simulate the "Affinity Rules Violation."
The script will apply the issue.yaml
file to simulate the problem in your Kubernetes cluster. You can inspect the issue with:
kubectl describe pod <pod-name>
kubectl logs <pod-name>
After analyzing the issue, the script will guide you to apply the corresponding fix.yaml
file to resolve the issue:
kubectl apply -f fix.yaml
Verify that the issue is resolved by checking the pod's status:
kubectl get pods
The repository is organized as follows:
troubleshoot-kubernetes-like-a-pro/
├── dockerfiles/
│ ├── affinity-rules-violation/
│ ├── dns-resolution-failure/
│ ├── resource-issues/
│ └── ...
├── scenarios/
│ ├── affinity-rules-violation/
│ ├── dns-resolution-failure/
│ ├── resource-issues/
│ └── ...
├── scripts/
│ └── build_and_push.sh # Automation script to build and push Docker images
├── manage-scenarios.sh # Script to simulate and manage scenarios in Kubernetes
└── README.md
- dockerfiles/: Contains Dockerfiles to build images for scenarios.
- scenarios/: Contains individual scenario folders, each with YAML files (
issue.yaml
,fix.yaml
) and adescription.md
. - scripts/: Contains the automation script for building and pushing Docker images (
build_and_push.sh
).
The repository includes the following 35 Kubernetes troubleshooting scenarios:
- Affinity Rules Violation
- DNS Resolution Failure
- Insufficient Resources
- Outdated Kubernetes Version
- Security Context Issues
- CGroup Issues
- Failed Resource Limits
- Liveness Probe Failure
- Persistent Volume Claim Issues
- SELinux/AppArmor Policy Violation
- Cluster Autoscaler Issues
- File Permissions on Mounted Volumes
- Liveness & Readiness Failure
- PID Namespace Collision
- Service Account Permissions Issue
- Container Runtime (CRI) Errors
- Firewall Restriction
- LoadBalancer Service Misconfiguration
- Pod Disruption Budget Violations
- Service Port Mismatch
- Crash Due to Insufficient Disk Space
- Image Pull Backoff
- Network Connectivity Issues
- Port Binding Issues
- Taints and Tolerations Mismatch
- CrashLoopBackOff
- Image Pull Error
- Node Affinity Issue
- Readiness Probe Failure
- Volume Mount Issue
- Disk IO Errors
- Ingress Configuration Issue
- OOM Killed
- Resource Requests & Limits Mismatch
- Wrong Container Command
-
Always check the logs and describe the pod to identify the issue:
kubectl logs <pod-name> kubectl describe pod <pod-name>
-
If a fix doesn’t resolve the issue, verify cluster configurations and try reapplying the scenario.
-
In case the Docker container is not sufficient for troubleshooting, switch to Kubernetes to inspect and apply fixes directly.
Contributions are welcome! Feel free to:
- Add new scenarios: If you have an interesting or challenging Kubernetes issue, contribute by adding it to this repo.
- Improve existing scenarios: Fix bugs, improve documentation, or suggest enhancements.
To contribute, please submit a pull request with your changes.
This project is licensed under the MIT License.
This project is maintained by Koti. Thank you for exploring the Kubernetes Troubleshooting Scenarios Simulator!