🔧 Cloud Track - Advanced
Troubleshooting Runbook
Document a professional runbook for debugging common Kubernetes pod failures and production issues.
⏱️ 6-8 hours
🎯 Advanced
📋 Overview
Kubernetes fails in predictable ways. This project teaches you to create a systematic troubleshooting guide that any team member can follow during incidents.
🔨 Runbook Structure
Scenario 1: CrashLoopBackOff
Symptom: Pod continuously restarting
Investigation Commands:
kubectl describe pod <name>
kubectl logs <name> --previous
Common Causes:
- Application crash on startup
- Missing environment variables
- Insufficient memory (OOMKilled)
Scenario 2: ImagePullBackOff
Symptom: Cannot pull container image
Investigation:
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get pods <name> -o
jsonpath='{.status.containerStatuses[0].state.waiting.message}'
Solutions:
- Verify image name/tag exists
- Check imagePullSecrets
- Validate registry credentials
Scenario 3: Pending Pods (Resource Constraints)
Symptom: Pod stuck in Pending state
Investigation:
kubectl describe pod <name> | grep -A 5 Events
kubectl top nodes
Common Causes:
- Insufficient CPU/memory on nodes
- Node selector mismatch
- PVC not bound (storage issue)
Scenario 4: Service Not Reachable
Symptom: Cannot access service from other pods
Debugging Steps:
# Verify endpoints
kubectl get endpoints <service-name>
# Test from debug pod
kubectl run debug --image=busybox --rm -it -- wget -qO- http://<service>
📦 Deliverables
- ✓Runbook covering 5+ common failure scenarios
- ✓Step-by-step investigation commands for each
- ✓Decision tree or flowchart for triage
- ✓Formatted as PDF or Markdown (team-ready)