Highlights
- Focus: Master verification & speed - 16-17 tasks in 120 mins = ~7 mins each.
- Setup: Use aliases (
k
,$do
,$now
) only if you’ve practiced them. - YAML Tip: Generate with
--dry-run=client -o yaml
; never write from scratch. - Docs Shortcut: Use
kubectl explain
, not web docs. - Verify Everything: Always
get
,describe
,logs
, and check Events. - Storage: Default StorageClass + PV/PVC binding must work.
- Troubleshooting: Start from nodes → system pods → workloads.
- RBAC & Security: Test permissions with
kubectl auth can-i
. - Upgrade Flow: Control plane first, then kubelet restart.
- Networking: Endpoints tell truth; check DNS/CoreDNS if service fails.
- Time Rule: Max 8 mins per question; verify before moving on.
- Golden Trick: Imperative → YAML → Apply → Verify = full marks.
- Practice Goal: Score 90%+ in mock exams, finish in <100 mins.
- Mindset: Speed + verification = success in CKA.
Welcome! You're about to dive into a comprehensive guide that covers commands which can be leveraged as a verification method for the CKA exam.
Here's the reality: you have 120 minutes for around 16-17 questions, which means about 6-7 minutes per task. Sounds tight? It is! But master these verification techniques, and you'll be confident in finishing with time to spare.
Think of this guide as your exam companion. Each section is designed to be practical and focused on what actually matters during those 2 hours.
Exam Setup - The Speed Booster (Optional, But Highly Recommended!)
Should you set this up?
Honestly, it's up to you! Some folks love aliases and can't imagine working without them. Others prefer typing full commands every time. Here's my take: if you practice with these shortcuts for 2-3 weeks before the exam, they become second nature and can save you 10-20 minutes total. That's huge! But if you try them for the first time on exam day, they'll just confuse you and slow you down.
Good news about autocomplete:
The exam environment already has kubectl autocomplete enabled! So, when you type kubectl get po and hit TAB, it auto-completes to kubectl get pods. This works out of the box - no setup needed. Pretty sweet, right?
The time-saving aliases:
Let me be real with you - typing kubectl many a times during the exam gets old fast. That's why I use k as an alias. Same with $do for --dry-run=client -o yaml - typing that several times is mind-numbing. But here's the deal: practice with these for at least 2 weeks before the exam, or don't use them at all. Half-learned shortcuts will trip you up under pressure.
cat >> ~/.bashrc << 'EOF'
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgs='kubectl get svc'
alias kd='kubectl describe'
export do='--dry-run=client -o yaml'
export now='--force --grace-period=0'
EOF
source ~/.bashrc
Set up aliases
Why these specific aliases?
- k - Because typing "kubectl" 200 times hurts
- kgp and kgs - Your most-used commands deserve shortcuts
- $do - This magical variable generates YAML templates instantly
- $now - Deletes pods immediately without 30-second grace period
Configure vim for YAML editing:
YAML is whitespace-sensitive (2 spaces for indentation, never tabs). Vim's defaults will drive you crazy if you don't fix them:
cat >> ~/.vimrc << 'EOF'
set number
set tabstop=2
set shiftwidth=2
set expandtab
EOF
YAML editing configuration
Now here's something CRITICAL: When you save commands to files (like generating YAML), you MUST use the full kubectl command, not the k alias. Why? Because aliases only work in your interactive shell, not in files. So:
✅ Correct: kubectl run nginx --image=nginx $do > pod.yaml
❌ Wrong: k run nginx --image=nginx $do > pod.yaml (won't work during exam verification)
The $do variable works in both because it's an environment variable, but k is a shell alias and only works when you type it directly.
Author's recommendation:
Spend your first 2-3 minutes of the exam setting these up if you've practiced with them. If you haven't practiced, skip them entirely and just use full commands with autocomplete. A familiar workflow beats a faster unfamiliar one every single time.
1. Storage (10% of the exam)
This is where things get interesting. Storage questions separate the people who've actually worked with Kubernetes from those who just read about it. You'll be verifying that StorageClasses work, PVs and PVCs bind properly, and most importantly - that dynamic provisioning actually creates storage when you need it.
Here's what trips people up: they create a PVC, see it stuck in Pending, panic, and waste 10 minutes troubleshooting. By the end of this section, you'll diagnose that in 30 seconds flat.
StorageClass Verification
What you're really checking: Think of StorageClasses as templates for creating storage. The big questions are: Which one is default? What happens to data when I delete a PVC (Retain or Delete)? And when does the volume get created (Immediate or WaitForFirstConsumer)?
These aren't just academic questions - they determine whether your application's data survives or gets wiped out!
# List storage classes
kubectl get sc
# Check default storage class
kubectl get sc | grep "(default)"
# Describe storage class details
kubectl describe sc <sc-name>
# Get specific fields
kubectl get sc <sc-name> -o jsonpath='{.reclaimPolicy}'
kubectl get sc <sc-name> -o jsonpath='{.volumeBindingMode}'
Critical insight: If there's no default StorageClass and you create a PVC without specifying one, congratulations - you've just created a PVC that will sit in Pending status forever. Ask me how I know! (Hint: I lost a question on a practice exam learning this the hard way)
Dynamic Provisioning Test
The real test: Dynamic provisioning is Kubernetes automatically creating storage volumes for you. When you create a PVC, Kubernetes should automatically provision a PV to satisfy it. This is the #1 storage failure point in the exam - you need to verify the entire chain works.
# Watch PVC bind
kubectl get pvc test-pvc --watch
# Expected: Pending → Bound (5-10 seconds)
# Verify PV created
kubectl get pv
# Check provisioning events
kubectl describe pvc test-pvc | grep -A 10 Events
# Troubleshoot stuck PVC
kubectl get sc <sc-name> # Verify SC exists
kubectl get pods -n kube-system | grep provisioner # Check provisioner running
kubectl logs -n kube-system <provisioner-pod> # Check logs
Critical insight: If a PVC stays Pending for more than 10 seconds, something is wrong. The --watch flag is essential here - it shows you the status changing in real-time so you immediately know when it succeeds or fails. The Events section tells you exactly what went wrong.
PV/PVC Binding
What's happening: PVs (PersistentVolumes) are cluster resources, PVCs (PersistentVolumeClaims) are requests for storage. They must "bind" to each other based on matching criteria (size, access mode, StorageClass). Think of PV as the actual hard drive, and PVC as the application saying "I need 10GB of storage."
# View together
kubectl get pv,pvc
# Check binding details
kubectl describe pvc <pvc-name>
# Look for: Status=Bound, Volume=<pv-name>, Events=SuccessfulBinding
# Bidirectional verification
kubectl get pvc <pvc-name> -o jsonpath='{.spec.volumeName}'
kubectl get pv <pv-name> -o jsonpath='{.spec.claimRef.name}'
# Make Released PV available again
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'
Common exam scenario: You delete a PVC, and the PV goes into "Released" state. The exam might ask you to make that PV available again for binding. The patch command clears the claim reference, making it Available again. This is a common 2-3 point question.
Access Modes & Reclaim Policies
Why this matters: Access modes determine how many pods can use the volume simultaneously. RWO (ReadWriteOnce) means only one pod can write - if you try to schedule two pods with RWO volumes on different nodes, one will fail. RWX (ReadWriteMany) allows multiple pods to write simultaneously. Mismatched access modes are a frequent exam trap.
# View access modes
kubectl get pv -o custom-columns=NAME:.metadata.name,ACCESS:.spec.accessModes
# View reclaim policies
kubectl get pv -o custom-columns=NAME:.metadata.name,RECLAIM:.spec.persistentVolumeReclaimPolicy
# Change reclaim policy
kubectl patch pv <pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
Exam trap: The exam may give you a PV with "Delete" reclaim policy and ask you to preserve data after PVC deletion. You MUST change it to "Retain" before deleting the PVC, or all data is lost. The patch command is the fastest way to change this.
2. Troubleshooting (30% of exam)
Troubleshooting is 30% of the exam, which means it's worth almost as much as ALL other domains combined. And here's the thing - you can't just memorize answers. Each troubleshooting question is a unique puzzle. A pod won't start, a service doesn't work, a node goes rogue - and you need to figure out why.
But here's the secret: 90% of troubleshooting follows the same pattern. Check the big stuff first (nodes, control plane), then drill down (pods, containers). It's like debugging anything else - you don't start by checking individual lines of code when the whole server might be down, right?
The mindset shift: Stop thinking "what's the answer?" and start thinking "what would I check first?" This systematic approach will serve you way better than trying to memorize every possible failure mode.
Cluster Health
Start here, always: Before you troubleshoot anything specific, answer this: "Is the cluster even healthy?" Because if the nodes are down or the API server is crashed, nothing else matters. You can't fix a pod on a broken cluster.
Think of it like troubleshooting your car - before you check why the radio isn't working, make sure the engine is running!
# Check nodes
kubectl get nodes
kubectl get nodes -o wide
# Check cluster components
kubectl get componentstatuses
kubectl get --raw='/readyz?verbose'
kubectl version
# View cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events -A --sort-by=.metadata.creationTimestamp
What you're looking for: All nodes showing "Ready" status. If you see "NotReady," stop everything - that's your problem right there. The kubectl version might seem too simple, but if the server version doesn't show up? Your API server is down, and that's why literally nothing is working.
Pro move: Events sorted by creation time show you what just broke. The most recent event is usually your culprit. I once spent 15 minutes troubleshooting a pod before checking events and seeing "ImagePullBackOff: image not found." Face, meet palm.
Node Diagnostics
When nodes fail: A "NotReady" node is a common exam scenario. The node could be out of resources (CPU, memory, disk), the kubelet could be down, or there could be network issues. The describe command gives you everything you need to diagnose this.
# Detailed node info
kubectl describe node <node-name>
# Check: Conditions (Ready, MemoryPressure, DiskPressure)
# Resource allocation
kubectl describe nodes | grep -A 10 "Allocated resources"
# Debug node with shell
kubectl debug node/<node-name> -it --image=ubuntu
# Node filesystem at /host
The Conditions section is gold: It tells you exactly what's wrong - MemoryPressure=True means the node is out of memory, DiskPressure=True means disk is full. The "Allocated resources" section shows if pods are requesting more resources than the node has available. The debug command gives you shell access to the node even if SSH is broken - this is a game-changer for node troubleshooting.
Control Plane Components
Why this matters: The control plane (API server, scheduler, controller manager, etcd) runs the entire cluster. If any component is down, specific functions break: API server down = can't run kubectl commands, scheduler down = pods stuck Pending, controller manager down = no deployments/replicasets work, etcd down = no persistent state.
# Check control plane pods
kubectl get pods -n kube-system
kubectl get pods -n kube-system -l component=kube-apiserver
kubectl get pods -n kube-system -l component=kube-scheduler
# Check logs
kubectl logs -n kube-system <component-pod>
journalctl -u kubelet -n 100
# Verify kubelet (on node)
systemctl status kubelet
journalctl -u kubelet -f
Exam scenario: If all pods are stuck Pending, check the scheduler. If deployments aren't creating pods, check the controller manager. If nothing works, check the API server. The logs tell you the exact error - certificate expired, can't connect to etcd, port already in use, etc.
etcd Health
Critical understanding: etcd is the database for everything in Kubernetes. If etcd is unhealthy, the entire cluster is unstable. In HA clusters, you need at least (n/2)+1 members healthy (quorum). With 3 members, you can lose 1. With 5 members, you can lose 2.
# Health check
kubectl exec -n kube-system etcd-<node> -- etcdctl \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
endpoint health
# List members
kubectl exec -n kube-system etcd-<node> -- etcdctl \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
member list
# Via API server
kubectl get --raw=/livez/etcd
# Backup and verify
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
--cert=<cert> --key=<key> --cacert=<cacert>
etcdutl --write-out=table snapshot status snapshot.db
High-value exam task: etcd backup/restore questions are worth more points. You MUST get the certificate paths exactly right or the command fails. The certificate paths are usually in /etc/kubernetes/pki/etcd/. The snapshot status command verifies the backup is valid before you try to restore it.
Resource Monitoring
What you're diagnosing: Pods can fail because nodes run out of CPU or memory. The metrics server provides this real-time data. Without metrics server, kubectl top won't work, and HPA (Horizontal Pod Autoscaler) won't function.
# Verify metrics server
kubectl get deployment metrics-server -n kube-system
# Node/pod resource usage
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=cpu
kubectl top pods --all-namespaces --sort-by=memory --containers
# Resource quotas
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota <n>
Practical use: If pods are stuck Pending with "Insufficient CPU" or "Insufficient memory" errors, kubectl top nodes shows you exactly which nodes are maxed out. Sorting by CPU or memory shows which pods are the resource hogs. This guides your troubleshooting - maybe you need to scale down other pods or add more nodes.
Container Logs
The truth is in the logs: When a pod crashes, the logs tell you why - missing environment variable, can't connect to database, out of memory, etc. Logs are your primary debugging tool for application issues.
# View logs
kubectl logs <pod>
kubectl logs <pod> -c <container>
kubectl logs <pod> -f --tail=50 --since=5m
kubectl logs <pod> --previous # For CrashLoopBackOff
# Multi-container pods
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'
kubectl logs <pod> --all-containers=true
# Filter logs
kubectl logs <pod> | grep -i "error\|warn"
Critical flag: --previous: When a pod is in CrashLoopBackOff, the current container hasn't started yet, so there are no logs. The --previous flag shows logs from the crashed container, which contains the actual error message. This is essential for debugging startup failures.
Service Troubleshooting
Common issue: "I can curl the pod directly, but the service doesn't work." This means the problem is with the service, not the pod. Usually it's a selector mismatch - the service selector doesn't match the pod labels.
# Check service and endpoints
kubectl get svc <service>
kubectl get endpoints <service>
# Verify selector
kubectl describe svc <service> | grep Selector
kubectl get pods --selector=<label-key>=<label-value>
# Debug pod with network tools
kubectl run test --rm -it --image=nicolaka/netshoot -- bash
# Then: nslookup <service>, curl <service>:<port>
# DNS check
kubectl exec -it <pod> -- cat /etc/resolv.conf
kubectl get pods -n kube-system -l k8s-app=kube-dns
The endpoints tell the truth: If kubectl get endpoints <service> returns empty, no pods match the service selector. Fix the labels. If endpoints exist but the service still doesn't work, it's usually DNS or network policy blocking traffic. The netshoot image has every network debugging tool you need - nslookup for DNS, curl for HTTP, ping for connectivity.
3. Workloads & Scheduling (15% of exam)
This section tests deployments, rolling updates, configuration management (ConfigMaps/Secrets), autoscaling, and pod scheduling. You need to verify deployments roll out correctly, configurations are injected properly, and pods are scheduled according to constraints like node affinity and taints.
Deployments
What deployments do: Deployments manage ReplicaSets, which manage Pods. When you update a deployment (change image, add env var, etc.), it creates a new ReplicaSet and gradually shifts traffic from old to new. This is a rolling update - zero downtime.
# Check status
kubectl get deployment <n>
kubectl rollout status deployment/<n>
# Watch ReplicaSets
kubectl get rs --watch
# Revision history
kubectl rollout history deployment/<n>
kubectl rollout history deployment/<n> --revision=2
What READY means: "3/3" means 3 pods are ready out of 3 desired. "2/3" means one pod is still starting or failing. The rollout status command streams updates in real-time - "Waiting for rollout to finish: 2 out of 3 new replicas have been updated" - so you know exactly what's happening. The revision history is crucial for rollbacks.
Rolling Updates & Rollbacks
Exam scenario: "Update the deployment to use nginx:1.16, verify the rollout succeeds, then roll back to the previous version." This is a 3-4 point question testing your understanding of deployment lifecycles.
# Update image
kubectl set image deployment/<n> <container>=<new-image>
# Check strategy
kubectl get deployment <n> -o yaml | grep -A 5 strategy
# Rollback
kubectl rollout undo deployment/<n>
kubectl rollout undo deployment/<n> --to-revision=2
Rolling update strategy: maxUnavailable and maxSurge control how the update happens. maxUnavailable=1 means at most 1 pod can be down during the update. maxSurge=1 means you can temporarily have 1 extra pod above the desired count. Understanding this helps you troubleshoot stuck rollouts.
ConfigMaps & Secrets
Why they exist: Hard-coding config into container images is bad practice. ConfigMaps hold non-sensitive config (database URL, feature flags), Secrets hold sensitive data (passwords, API keys). You inject them as environment variables or files.
# View content
kubectl get configmap <n> -o yaml
kubectl get secret <n> -o yaml
# Decode secret
kubectl get secret <n> -o jsonpath='{.data.password}' | base64 -d
# Verify in pod
kubectl exec -it <pod> -- env | grep <VAR>
kubectl exec -it <pod> -- ls /etc/config
kubectl exec -it <pod> -- cat /etc/config/<key>
Exam gotcha: Secrets are base64-encoded, NOT encrypted. You must decode them to read the actual value. When troubleshooting "pod can't connect to database," check if the password secret is correct by decoding it. Verifying the config inside the pod confirms it was injected properly.
HPA (Horizontal Pod Autoscaler)
How it works: HPA watches CPU/memory metrics and scales pods up or down automatically. If CPU usage > 80% target, it scales up. If CPU < 80%, it scales down. Requires metrics-server to be running.
# Check HPA
kubectl get hpa
kubectl describe hpa <n>
kubectl get hpa --watch
# Verify metrics server
kubectl top pods
# Generate load
kubectl run load-gen --image=busybox --restart=Never -- \
/bin/sh -c "while true; do wget -q -O- http://<svc>; done"
Exam scenario: Create an HPA that scales between 2-10 replicas based on 70% CPU utilization. After creating it, you MUST verify it works by generating load and watching the replica count increase. The --watch flag shows this in real-time. If it doesn't scale, check if metrics-server is running and if resource requests are defined on the pods.
Probes
Critical difference: Liveness probes determine if a container is alive - if it fails, Kubernetes restarts the container. Readiness probes determine if a container is ready to receive traffic - if it fails, Kubernetes removes the pod from service endpoints but doesn't restart it.
# Check configuration
kubectl describe pod <pod> | grep -A 10 Liveness
kubectl describe pod <pod> | grep -A 10 Readiness
# Check failures
kubectl describe pod <pod> | grep -A 20 Events
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].restartCount}'
# Verify endpoints (readiness)
kubectl get endpoints <service>
Why this matters: If a pod is restarting frequently (high restart count), check the liveness probe - maybe it's too aggressive (checks every 1 second, timeout 1 second). If a service has no traffic going to some pods, check readiness probe failures - those pods are removed from the service endpoints.
Resource Limits
Exam understanding: Requests are what Kubernetes uses for scheduling - "this pod needs at least 100m CPU." Limits are the maximum the pod can use - "this pod can use up to 500m CPU." QoS (Quality of Service) determines which pods get evicted first when the node runs out of resources.
# Check limits
kubectl describe pod <pod> | grep -A 10 Limits
# Check QoS
kubectl get pod <pod> -o jsonpath='{.status.qosClass}'
# Node allocation
kubectl describe nodes | grep -A 10 "Allocated resources"
QoS classes: Guaranteed (requests=limits) are evicted last. BestEffort (no requests/limits) are evicted first. Burstable (requests < limits) are in between. If your critical pods keep getting evicted, they need higher QoS - set requests=limits to make them Guaranteed.
Node Affinity & Taints
The scheduling story: Not all pods can run on all nodes. Node affinity is the pod saying "I prefer/require nodes with label X." Taints are the node saying "Don't schedule pods here unless they tolerate my taint." Together they control where pods land.
# Check node labels
kubectl get nodes --show-labels
# Check taints
kubectl describe node <node> | grep Taints
# Check tolerations
kubectl get pod <pod> -o yaml | grep -A 10 tolerations
# Verify scheduling
kubectl get pod <pod> -o wide
Exam scenario: "Schedule this pod only on nodes with label disk=ssd." You add nodeSelector or nodeAffinity to the pod spec. To verify, kubectl get pod <pod> -o wide shows which node it's on, then kubectl get node <node> --show-labels confirms that node has disk=ssd. For taints, if a node is tainted with key=value:NoSchedule, only pods with a matching toleration can schedule there.
Cluster Architecture (25% of exam)
Overview: This is the deepest technical section - RBAC security, cluster lifecycle management with kubeadm, HA configurations, package management with Helm, and extension points (CNI, CSI, CRI, CRDs). These are advanced admin topics that separate CKA from easier certifications.
RBAC
Security model: RBAC controls who can do what in the cluster. Users/ServiceAccounts (who) get permissions through Roles/ClusterRoles (what actions) via RoleBindings/ClusterRoleBindings (the assignment). Without proper RBAC, users either can't do their job or have too much access.
# Check permissions
kubectl auth can-i create pods
kubectl auth can-i list secrets --as user1 -n kube-system
kubectl auth can-i --list
# View roles and bindings
kubectl get role,rolebinding -n <namespace>
kubectl get clusterrole,clusterrolebinding
kubectl describe rolebinding <n>
# Test service account
kubectl auth can-i get pods --as system:serviceaccount:<ns>:<sa>
Exam task: "Create a Role that allows reading pods and services in namespace 'app', bind it to user 'dev'." After creating it, you MUST verify with kubectl auth can-i get pods --as dev -n app returning "yes". The --as flag lets you impersonate users to test permissions without switching credentials.
Infrastructure Preparation
Before kubeadm: You can't just run kubeadm init on a fresh server and expect it to work. The underlying infrastructure must meet specific requirements. The exam may ask you to verify prerequisites or troubleshoot a failed cluster initialization due to missing requirements.
System requirements to verify:
# Check Linux kernel version (must be 3.10+)
uname -r
# Verify required ports are available
# Control plane: 6443, 2379-2380, 10250-10252
# Worker nodes: 10250, 30000-32767
netstat -tuln | grep -E '6443|2379|2380|10250|10251|10252'
# Check if swap is disabled (REQUIRED)
free -h
# or
swapon --show
# Disable swap if enabled
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
# Verify container runtime installed
systemctl status containerd
# or
systemctl status docker
# Check if br_netfilter module is loaded
lsmod | grep br_netfilter
sudo modprobe br_netfilter
# Enable IP forwarding
cat /proc/sys/net/ipv4/ip_forward # Should return 1
sudo sysctl -w net.ipv4.ip_forward=1
sudo sysctl -w net.bridge.bridge-nf-call-iptables=1
# Make persistent
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
Why these checks matter: Swap disabled is mandatory - kubeadm will refuse to initialize if swap is on. The br_netfilter module is required for iptables to see bridged traffic (pod networking won't work without it). IP forwarding must be enabled for packets to route between pods on different nodes.
Package verification:
# Verify kubeadm, kubelet, kubectl installed
kubeadm version
kubelet --version
kubectl version --client
# Check kubelet is enabled (but not yet running before init)
systemctl is-enabled kubelet
systemctl status kubelet
Network prerequisites:
# Verify unique hostname per node
hostname
hostnamectl
# Check unique MAC addresses
ip link show | grep ether
# Verify DNS resolution works
nslookup google.com
ping -c 2 8.8.8.8
# Check firewall status (may need to allow Kubernetes ports)
sudo systemctl status firewalld
# or
sudo ufw status
Exam scenario: "The cluster initialization failed. Verify the prerequisites and fix any issues." You'd check: (1) swap is off, (2) required ports are available, (3) container runtime is running, (4) br_netfilter module is loaded, (5) IP forwarding is enabled. Fix what's broken, then retry kubeadm init.
Pre-flight checks:
# kubeadm has built-in checks
sudo kubeadm init --dry-run
# This shows what would happen without actually initializing
# Lists all prerequisite checks and their status
Common initialization blockers:
- Swap is on → swapoff -a
- Port 6443 already in use → Check what's using it: lsof -i :6443
- Container runtime not running → systemctl start containerd
- br_netfilter not loaded → modprobe br_netfilter
- Firewall blocking ports → Configure firewall rules
kubeadm Cluster
Foundation knowledge: kubeadm is the standard tool for bootstrapping Kubernetes clusters. The exam assumes you understand the cluster initialization process, certificate management, and adding nodes.
# Verify initialization
kubectl cluster-info
kubectl get pods -n kube-system
# Check certificates
kubeadm certs check-expiration
# Generate join token
kubeadm token create --print-join-command
kubeadm token list
Certificate crisis: Kubernetes certificates expire after 1 year by default. If certificates expire, the cluster becomes inoperable. The kubeadm certs check-expiration command is your early warning system - if any cert expires in < 90 days, you need to renew it. Join tokens expire in 24 hours, so if you need to add a node later, you generate a new token.
Cluster Upgrade
High-stakes task: Upgrading a cluster is a common 7-10 point exam question. You must upgrade control plane nodes first, then worker nodes, and never skip minor versions (can't go from 1.27 to 1.29 directly).
# Check version
kubectl version --short
# Plan upgrade
kubeadm upgrade plan
# After kubeadm upgrade apply
kubectl get nodes # Still shows old version
apt-get install kubelet=1.29.0-00 kubectl=1.29.0-00
systemctl daemon-reload && systemctl restart kubelet
kubectl get nodes # Now shows new version
Critical sequence: kubeadm upgrade apply upgrades control plane components, but NOT kubelet. That's why kubectl get nodes still shows the old version - it reports kubelet version, not control plane version. You must manually upgrade kubelet packages and restart the service on each node. Missing this step is the #1 upgrade mistake.
High Availability
Production requirement: HA means multiple control plane nodes so if one fails, the cluster keeps running. You need odd numbers (3, 5, 7) of etcd members for quorum. The load balancer distributes API requests across multiple API servers.
# Check control plane nodes
kubectl get nodes --selector=node-role.kubernetes.io/control-plane
# Component distribution
kubectl get pods -n kube-system -o wide | grep -E 'etcd|apiserver'
# Test load balancer
curl -k https://<lb-endpoint>:6443/healthz
Verification matters: In a 3-node HA cluster, you should see 3 control plane nodes, 3 etcd pods, 3 API server pods. If the load balancer is working, curling it should return "ok." If one control plane node goes down, the cluster should still function - test this during verification.
Helm & Kustomize
Package management: Helm is like apt/yum for Kubernetes - it packages multiple resources into a "chart" for easy installation. Kustomize customizes YAML without templating. Both are standard tools for managing complex applications.
# Helm
helm list -A
helm status <release>
helm history <release>
helm get manifest <release>
helm install <release> <chart> --dry-run --debug
# Kustomize
kubectl kustomize ./
kubectl apply -k ./
Exam usage: "Install nginx-ingress using Helm, verify it's running." After helm install, use helm status to check deployment status, helm get manifest to see what resources were created, and kubectl get pods -n <namespace> to verify pods are running. The --dry-run --debug flags let you preview what will be installed without actually installing it.
Extension Interfaces
Advanced concept: Kubernetes is extensible through plugin interfaces. CNI (Container Network Interface) provides networking, CSI (Container Storage Interface) provides storage, CRI (Container Runtime Interface) provides container runtime. CRDs (Custom Resource Definitions) let you extend Kubernetes with custom resource types.
# CNI
kubectl get pods -n kube-system | grep -E 'calico|flannel|weave'
# CSI
kubectl get csidrivers
kubectl get storageclass
# CRI
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.containerRuntimeVersion}{"\n"}{end}'
# CRD
kubectl get crd
kubectl describe crd <n>
kubectl explain <custom-resource-type>
Why verify these: If no CNI is running, pods can't communicate. If CSI drivers are missing, dynamic provisioning won't work. Checking the container runtime confirms whether you're using Docker, containerd, or CRI-O. CRDs let you use kubectl with custom resources like Prometheus, Istio, or Operators.
Services & Networking (20% of exam)
Overview: Networking is how pods communicate with each other and the outside world. You need to understand Services (load balancing), NetworkPolicies (firewalls), Ingress (HTTP routing), and DNS. Networking failures are subtle - services may be created but not working due to selector mismatches or DNS issues.
Pod Connectivity
Foundation: Before troubleshooting services, verify basic pod-to-pod connectivity works. If pods can't ping each other, the CNI plugin is broken and nothing will work.
# Get pod IPs
kubectl get pods -o wide
# Test connectivity
kubectl exec -it <pod> -- ping <target-ip>
kubectl exec -it <pod> -- curl <target-ip>:<port>
# Debug pod
kubectl run netshoot --rm -it --image=nicolaka/netshoot -- bash
# Test service DNS
kubectl exec -it <pod> -- curl <service>:<port>
kubectl exec -it <pod> -- curl <service>.<ns>.svc.cluster.local:<port>
Debugging technique: The netshoot image is your network Swiss Army knife - it has curl, wget, nslookup, dig, ping, traceroute, netstat, and more. When troubleshooting network issues, always start with a netshoot pod. If curl <service> works but your application can't connect, the problem is with your application, not Kubernetes networking.
Service Validation
How services work: Services provide a stable endpoint (ClusterIP) that load balances to a set of pods. The service finds pods using a selector (label query). If no pods match the selector, the service has no endpoints and doesn't work.
# Check service
kubectl get svc
kubectl describe svc <service>
# Check endpoints
kubectl get endpoints <service>
# Verify selector
kubectl describe svc <service> | grep Selector
kubectl get pods --selector=<label>=<value>
The endpoints are the truth: If kubectl get endpoints <service> shows IPs, those are the pod IPs that will receive traffic. If it's empty, the selector doesn't match any pods - check labels. This is the #1 service problem in the exam. Common mistake: service selector is "app=web" but pod labels are "app=webapp."
Service Types
Three types you must know: ClusterIP (default) is only accessible within the cluster. NodePort opens a port on every node (30000-32767) for external access. LoadBalancer provisions a cloud load balancer (only works on cloud providers like AWS/GCP/Azure).
# ClusterIP (default)
kubectl get svc <service> # TYPE=ClusterIP
kubectl run test --rm -it --image=busybox:1.28 -- wget -O- <cluster-ip>:<port>
# NodePort
kubectl get svc <service> # PORT(S)=80:30007/TCP
curl <node-ip>:30007
# LoadBalancer
kubectl get svc <service> # EXTERNAL-IP populated
curl <external-ip>:<port>
Testing access: For ClusterIP, you must test from inside the cluster (hence the busybox pod). For NodePort, you can test from outside using any node's IP. For LoadBalancer, you use the external IP. A common exam task is "expose this deployment externally" - choose NodePort if no cloud provider, LoadBalancer if cloud provider is available.
Network Policies
Kubernetes firewall: By default, all pods can talk to all pods. NetworkPolicies restrict this - "only pods with label X can connect to pods with label Y on port 80." This is pod-level firewall rules.
# List policies
kubectl get networkpolicy -A
# Describe policy
kubectl describe netpol <n>
# Test enforcement
kubectl exec <pod> -- curl --max-time 5 <target> # Should timeout if blocked
kubectl exec <pod> -- curl <target> # Should work if allowed
# Verify CNI support
kubectl get pods -n kube-system | grep -E 'calico|cilium|weave'
Critical requirement: NetworkPolicies only work if your CNI plugin supports them. Calico, Cilium, and Weave do. Flannel does NOT. If you're using Flannel and create NetworkPolicies, they'll be created successfully but have no effect. Always verify CNI support first. Testing is essential - create a policy, then verify it actually blocks traffic as expected.
Ingress
HTTP routing: Ingress routes HTTP/HTTPS traffic to services based on hostname or path. "example.com/api → api-service, example.com/web → web-service." Requires an Ingress controller (nginx, traefik, etc.) to be installed.
# Check controller
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx
# Check Ingress
kubectl get ingress
kubectl describe ingress <n>
# Get IP
kubectl get ingress <n> -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# Test routing
curl -H "Host: example.com" http://<ingress-ip>/path
Exam scenario: "Create an Ingress that routes example.com/app to service app-svc port 80." After creating it, verify the ADDRESS is populated (may take 30 seconds). Then test with curl - if you get 404, the service name is wrong. If you get connection refused, the service isn't running. If you get 502 Bad Gateway, the service endpoints are empty.
Gateway API
The future of Ingress: Gateway API is the successor to Ingress, offering more expressive routing, better multi-tenancy, and role-oriented design. While traditional Ingress is still common, Gateway API is gaining adoption and may appear in the exam.
Key differences from Ingress:
- GatewayClass (like StorageClass) - defines the type of Gateway
- Gateway (like LoadBalancer Service) - the infrastructure layer
- HTTPRoute (like Ingress) - the routing rules
Check Gateway API installation:
# Verify Gateway API CRDs installed
kubectl get crd | grep gateway
# List GatewayClasses
kubectl get gatewayclass
# List Gateways
kubectl get gateway -A
# List HTTPRoutes
kubectl get httproute -A
Gateway verification:
# Describe Gateway for detailed info
kubectl describe gateway <gateway-name> -n <namespace>
# Check Gateway status
kubectl get gateway <gateway-name> -n <namespace> -o yaml | grep -A 5 status
# Look for:
# - conditions: status=True, type=Programmed
# - addresses: external IP or hostname assigned
Expected output: Gateway should show Programmed=True condition and have an address assigned. If address is missing, the Gateway controller may not be running or cloud provider integration is broken.
HTTPRoute verification:
# List HTTPRoutes
kubectl get httproute -A
# Describe HTTPRoute
kubectl describe httproute <route-name> -n <namespace>
# Check if route attached to Gateway
kubectl get httproute <route-name> -n <namespace> -o yaml | grep -A 3 parentRefs
What to verify:
- parentRefs: Which Gateway this route attaches to
- hostnames: What domain names this route handles
- rules: Path matching and backend service references
- status: Whether the route was accepted by the Gateway
Test Gateway routing:
# Get Gateway address
GATEWAY_IP=$(kubectl get gateway <gateway-name> -n <namespace> \
-o jsonpath='{.status.addresses[0].value}')
# Test routing with curl
curl -H "Host: example.com" http://$GATEWAY_IP/api
# Test different paths
curl -H "Host: example.com" http://$GATEWAY_IP/web
# Verbose output for debugging
curl -v -H "Host: example.com" http://$GATEWAY_IP/path
Common Gateway API issues:
# Gateway stuck in Pending
kubectl describe gateway <n>
# Check Events for: "No addresses available", "Controller not found"
# HTTPRoute not working
kubectl describe httproute <n>
# Check for: "RouteReasonNoMatchingParent", "RouteReasonBackendNotFound"
# Verify Gateway controller is running
kubectl get pods -n gateway-system
# or wherever your Gateway controller is deployed
# Check Gateway controller logs
kubectl logs -n gateway-system <gateway-controller-pod>
Gateway API vs Traditional Ingress:
Aspect | Traditional Ingress | Gateway API |
---|---|---|
Resource | Ingress | HTTPRoute |
Infrastructure | IngressClass | GatewayClass + Gateway |
Routing | Path/Host rules | More expressive matching |
Multi-tenancy | Limited | Better separation |
Protocol support | HTTP/HTTPS mainly | HTTP, HTTPS, TCP, gRPC |
Exam scenario: "Create an HTTPRoute that routes traffic from Gateway 'main-gateway' to service 'api-svc' on path /api."
After creation:
- Verify HTTPRoute exists: kubectl get httproute
- Check it's attached to Gateway: kubectl describe httproute <n>
- Verify Gateway has address: kubectl get gateway main-gateway
- Test routing: curl -H "Host: api.example.com" http://<gateway-ip>/api
Quick debugging workflow:
- Check GatewayClass exists and has a controller
- Check Gateway references valid GatewayClass
- Check Gateway has Programmed=True condition
- Check Gateway has address assigned
- Check HTTPRoute parentRefs matches Gateway name
- Check HTTPRoute backend service exists and has endpoints
- Test with curl using correct Host header
Note: Gateway API is relatively new. If the exam cluster doesn't have Gateway API CRDs installed, stick with traditional Ingress. Check with kubectl get crd | grep gateway first.
DNS
Why DNS matters: DNS is how pods find services by name instead of IP. Without working DNS, you'd need to hardcode ClusterIPs everywhere. CoreDNS provides this critical service - if it's down, nothing can communicate using service names.
# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get svc -n kube-system kube-dns
# Test DNS
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -it dnsutils -- nslookup kubernetes.default
# Check pod DNS config
kubectl exec -it <pod> -- cat /etc/resolv.conf
# Test all forms
kubectl exec -it dnsutils -- nslookup <service>
kubectl exec -it dnsutils -- nslookup <service>.<namespace>
kubectl exec -it dnsutils -- nslookup <service>.<ns>.svc.cluster.local
The DNS hierarchy: Services can be accessed by short name (same namespace only), namespace-qualified name (cross-namespace), or FQDN (fully qualified domain name). If nslookup <service> fails but nslookup <service>.<namespace> works, you're testing from a different namespace. The /etc/resolv.conf file shows the nameserver IP (should be CoreDNS ClusterIP, typically 10.96.0.10) and search domains that make short names work.
Imperative Commands & Time-Savers
The exam game-changer: Writing YAML from scratch takes 5-8 minutes per resource. Generating it with imperative commands takes 30 seconds. Over 15-20 questions, this saves you 60-90 minutes. Mastering these commands is the difference between finishing the exam with time to spare versus running out of time.
Generate YAML (Most Important!)
This is the game-changer: I'm going to be blunt - if you're writing YAML from scratch during the exam, you're doing it wrong. It takes 5-8 minutes to write a deployment YAML from memory, and you'll probably make syntax errors. The dry-run technique does it in 30 seconds with perfect syntax every time.
Here's how it works: Tell kubectl to show you the YAML it would create, but don't actually create anything. Pipe that to a file, edit the 2-3 lines you need to change, then apply. Boom - you just saved 4-5 minutes.
IMPORTANT: When generating YAML files, always use full kubectl commands, not the k alias:
# Pod - use full kubectl command when saving to file
kubectl run nginx --image=nginx $do > pod.yaml
# Deployment
kubectl create deployment nginx --image=nginx --replicas=3 $do > deploy.yaml
# Service
kubectl expose deployment nginx --port=80 --target-port=8080 $do > svc.yaml
# ConfigMap
kubectl create configmap app-config --from-literal=key=val $do > cm.yaml
# Secret
kubectl create secret generic db-secret --from-literal=pass=123 $do > secret.yaml
Why full kubectl in files? The k alias only exists in your shell session. If you save k run nginx... in a script or try to reference it later, it won't work. The $do variable works fine because it's an environment variable that expands before hitting the file. But k is a shell alias - it's like a nickname that only works face-to-face!
Think about it this way: You can type k get pods at the command line all day long (fast and easy!), but when you're generating files, use kubectl (reliable and portable).
Real example from the exam: Question says "Create a pod named web with image nginx:1.19, label tier=frontend, CPU request 100m."
Instead of this nightmare:
# Writing this from scratch = 5 minutes + probable typos
apiVersion: v1
kind: Pod
metadata:
name: web
labels:
tier: frontend
spec:
containers:
- name: nginx
image: nginx:1.19
resources:
requests:
cpu: "100m"
Do this:
kubectl run web --image=nginx:1.19 --labels=tier=frontend $do > pod.yaml
# Edit pod.yaml, add these 3 lines under containers[0]:
# resources:
# requests:
# cpu: "100m"
kubectl apply -f pod.yaml
kubectl explain (2x faster than docs!)
Stop searching documentation: The exam provides access to kubernetes.io docs, but searching for the right page and finding the right YAML syntax requires time. kubectl explain gives you the answer in 30 seconds, right in your terminal.
# Find field syntax
kubectl explain pod.spec.containers.livenessProbe
kubectl explain pod.spec.volumes
kubectl explain deployment.spec.strategy
# Recursive view
kubectl explain pod --recursive
# Navigate hierarchically
kubectl explain deployment
kubectl explain deployment.spec
kubectl explain deployment.spec.template
Practical example: You need to add a readiness probe with HTTP GET. Instead of searching docs:
kubectl explain pod.spec.containers.readinessProbe.httpGet
Output shows you:
path: <string>
port: <string>
httpHeaders: <[]Object>
Now you know exactly what fields exist and their types. Write your YAML with confidence.
Quick Creation Patterns
No YAML needed: For simple resources, imperative commands create them instantly. Only use YAML when you need complex configurations like multiple volumes, init containers, or advanced scheduling constraints.
# Pod with labels
kubectl run nginx --image=nginx --labels="app=web,env=prod"
# Pod with env vars
kubectl run nginx --image=nginx --env="KEY=value"
# Deployment with replicas
kubectl create deployment web --image=nginx --replicas=3
# Expose deployment
kubectl expose deployment web --port=80 --type=NodePort
# Scale
kubectl scale deployment web --replicas=5
# Update image
kubectl set image deployment/web nginx=nginx:1.16
# ConfigMap/Secret
kubectl create configmap app --from-literal=key=value
kubectl create secret generic db --from-literal=pass=secret
Chaining commands: You can create and expose in one line:
kubectl create deployment web --image=nginx --replicas=3 && \
kubectl expose deployment web --port=80 --type=NodePort && \
kubectl get svc,pods
This creates deployment, exposes it, and shows you the results - 15 seconds total.
Exam Workflow
Strategic approach: The exam is as much about time management as technical skill. You can know everything but fail if you spend 20 minutes on a 5-point question. Follow this workflow religiously.
Step-by-step process:
- Switch context (copy from question exactly)
kubectl config use-context <context>
kubectl config current-context # Verify
Why this matters: The exam uses different contexts (clusters). If you forget to switch, you're working on the wrong cluster and get zero points even if your solution is perfect. Always verify with current-context.
- Read question completely - note all requirements
Don't skim! Questions often have multiple requirements: "Create deployment X with 3 replicas, expose on NodePort 30080, ensure pods run on nodes with label disk=ssd." Miss one requirement = lose points.
- Choose approach:
- Imperative command? → Use it
- Complex config? → Generate YAML, edit, apply
Simple tasks (create pod, scale deployment, expose service) = imperative. Complex tasks (pod with multiple volumes, init containers, complex scheduling) = YAML.
- Execute solution
Work fast but deliberately. Copy-paste resource names and image names from the question to avoid typos.
- Verify immediately:
kubectl get <resource>
kubectl describe <resource>
kubectl logs <pod> # If applicable
Critical habit: Don't move to the next question until you verify this one works. A non-working solution gets zero points. 30 seconds of verification saves you from getting nothing.
- Confirm all requirements met
Go back to the question and check each requirement one by one. Label correct? Replicas correct? Port correct? Image correct?
- Move to next question
If you're stuck after 7 to 8 minutes, flag the question and move on. Come back to it if you have time. Don't let one hard question prevent you from answering five easy ones.
Time Management
- First pass (90 min): Easy & medium questions
- Skip anything that takes > 8 minutes
- Get the "easy points" first
- Most candidates can score 60-70% just from easy/medium questions
- Second pass (25 min): Flagged difficult questions
- Now tackle the hard ones
- You have buffer time so you can think deeper
- Partial credit is better than nothing
- Final review (5 min): Verify critical tasks
- Check etcd backups were created and verified
- Check cluster upgrades show correct version
- Verify you switched contexts correctly for each question
- Quick spot-check of a few answers
- Golden rule: Never spend >8 minutes on one question
- 15 questions in 120 minutes = 8 minutes average
- If you're stuck at 8 minutes, you're losing time on other questions
- Flag it and move on
Real-World Troubleshooting Scenarios
Practice these scenarios to build troubleshooting muscle memory:
Scenario 1: Pod Stuck in Pending
Problem: Pod created but stuck in Pending state for 5+ minutes.
Investigation workflow:
# Step 1: Check pod status and events
kubectl describe pod <pod-name>
# Look in Events section for: FailedScheduling, Insufficient CPU/memory
# Step 2: Check node resources
kubectl top nodes
kubectl describe nodes | grep -A 10 "Allocated resources"
# Step 3: Check if PVC is bound (if pod uses volumes)
kubectl get pvc
# Step 4: Check node taints and pod tolerations
kubectl describe nodes | grep Taints
kubectl get pod <pod-name> -o yaml | grep -A 5 tolerations
Common causes & fixes:
- Insufficient resources: Nodes don't have enough CPU/memory
- Fix: Scale down other pods or add more nodes
- Verify: kubectl top nodes shows available resources
- PVC not bound: Pod waiting for storage
- Fix: Check PVC status, verify StorageClass exists
- Verify: kubectl get pvc shows Bound status
- Node affinity mismatch: No nodes match affinity rules
- Fix: Check node labels match pod's nodeSelector/affinity
- Verify: kubectl get nodes --show-labels
- Taints without tolerations: Nodes are tainted, pod doesn't tolerate
- Fix: Add toleration to pod or remove taint from node
- Verify: Pod schedules successfully
Scenario 2: CrashLoopBackOff
Problem: Pod status shows CrashLoopBackOff, container keeps restarting.
Let me tell you a story: This is probably the most frustrating status to see. Your pod is like "I'll try to start... nope, crashed. Let me try again... nope, crashed again. One more time... still crashing." And it keeps doing this forever, waiting longer between each attempt (that's the "backoff" part).
The container is running, something crashes it, Kubernetes restarts it, it crashes again. Rinse and repeat. And here's the kicker - if you just run kubectl logs <pod>, you might see nothing! Why? Because the container that crashed isn't running anymore. You need the --previous flag to see the logs from the crashed container. This one flag has saved me countless times.
Investigation workflow:
# Step 1: Check restart count
kubectl get pod <pod-name>
# RESTARTS column shows how many times - if it's 5+, you've got a real problem
# Step 2: Check previous container logs (THIS IS THE GOLDEN TICKET!)
kubectl logs <pod-name> --previous
# Step 3: Check current container logs (might be empty)
kubectl logs <pod-name>
# Step 4: Describe pod for detailed events
kubectl describe pod <pod-name>
# Step 5: Check liveness/readiness probes
kubectl describe pod <pod-name> | grep -A 10 "Liveness\|Readiness"
Common causes & fixes:
1. Application crash on startup - Missing environment variables, can't connect to database
This is the #1 cause. Your app needs DATABASE_URL but it's not set. Or the database service doesn't exist yet. The logs tell you everything:
kubectl logs web-pod --previous
# Error: DATABASE_URL environment variable not set
# Error: Can't connect to mysql:3306
# Check if database service exists
kubectl get svc mysql
# Error from server (NotFound): services "mysql" not found
# Aha! Create the missing service
kubectl expose deployment mysql --port=3306
# Delete pod to restart with valid config
kubectl delete pod web-pod
kubectl get pods --watch
# Now it should start successfully
2. Liveness probe too aggressive - Probe kills container before app finishes starting
Your app takes 30 seconds to fully start, but the liveness probe checks every 5 seconds and kills it if it doesn't respond in 1 second. Boom - CrashLoopBackOff city!
# Check the probe settings
kubectl describe pod <pod> | grep -A 10 Liveness
# Liveness: http-get http://:8080/ delay=0s timeout=1s period=5s
# See the problem? delay=0s means it starts checking IMMEDIATELY
# Your app hasn't even started yet!
# Fix: Edit deployment to increase initialDelaySeconds
kubectl edit deployment <deployment-name>
# Change: initialDelaySeconds: 30
# This gives your app time to start before probes begin
3. Out of memory - Container killed by OOM (Out Of Memory)
Your container is using more memory than its limit, so Kubernetes kills it. Then it restarts, uses too much memory again, gets killed again. Repeat forever.
kubectl describe pod <pod-name>
# Look for: Last State: Terminated, Reason: OOMKilled
# Fix: Increase memory limits or optimize your application
kubectl edit deployment <deployment-name>
# resources:
# limits:
# memory: "512Mi" # Increase this
The pattern you should see: Every CrashLoopBackOff has a reason in the logs or events. You just need to look in the right place. Logs from the previous container (--previous) are your best friend here. Don't skip this step!
Scenario 3: Service Not Accessible
Problem: Service created but can't access it (curl fails, connection timeout).
Investigation workflow:
# Step 1: Verify service exists
kubectl get svc <service-name>
# Step 2: Check endpoints (MOST IMPORTANT!)
kubectl get endpoints <service-name>
# Step 3: If endpoints are empty, check selector
kubectl describe svc <service-name> | grep Selector
kubectl get pods --selector=<label-key>=<label-value>
# Step 4: If pods exist but not in endpoints, check readiness
kubectl describe pod <pod-name> | grep -A 5 "Ready:"
# Step 5: Test connectivity from debug pod
kubectl run test --rm -it --image=nicolaka/netshoot -- bash
# Inside pod: curl <service-name>:<port>
# Step 6: Check network policies
kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name>
Common causes & fixes:
- Empty endpoints - Selector mismatch: Service selector doesn't match pod labels
- Fix: Update service selector or pod labels to match
- Example:
# Service has selector: app=web
kubectl describe svc web-svc | grep Selector
# Selector: app=web
# But pods have label: app=webapp
kubectl get pods --show-labels
# Fix: Update service selector
kubectl patch svc web-svc -p '{"spec":{"selector":{"app":"webapp"}}}'
# Verify endpoints populated
kubectl get endpoints web-svc
- Pods not ready: Readiness probe failing
- Fix: Check pod logs, fix readiness probe or application
- Verify: kubectl get pods shows READY 1/1
- NetworkPolicy blocking: Policy prevents traffic
- Fix: Update NetworkPolicy to allow traffic or remove it
- Verify: Test connectivity succeeds
- Wrong port: Service port doesn't match pod port
- Fix: Update service targetPort to match container port
- Verify: kubectl describe svc shows correct port mapping
Scenario 4: Node NotReady
Problem: One or more nodes showing NotReady status.
Investigation workflow:
# Step 1: Identify NotReady node
kubectl get nodes
# Step 2: Describe node for conditions
kubectl describe node <node-name>
# Check Conditions: MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable
# Step 3: Check kubelet status (SSH to node)
systemctl status kubelet
journalctl -u kubelet -n 50
# Step 4: Check node resources
kubectl top node <node-name>
# Step 5: Debug node with shell access
kubectl debug node/<node-name> -it --image=ubuntu
# Inside debug pod, check:
df -h # Disk usage at /host
free -h # Memory
Common causes & fixes:
- Kubelet stopped: Service crashed or not running
- Fix: Restart kubelet
systemctl restart kubelet
systemctl status kubelet
# Verify: Node returns to Ready
kubectl get nodes
Disk full: Node out of disk space (DiskPressure=True)
- Fix: Clean up logs, old images, evicted pods
# SSH to node
docker system prune -a
# or for containerd
crictl rmi --prune
# Remove old logs
journalctl --vacuum-time=3d
- Network plugin issue: CNI pod crashed
- Fix: Check CNI pod status, restart if needed
kubectl get pods -n kube-system | grep calico
kubectl delete pod <cni-pod> -n kube-system
- Certificate expired: Kubelet can't authenticate to API server
- Fix: Renew certificates
kubeadm certs renew all
systemctl restart kubelet
Scenario 5: Deployment Rollout Stuck
Problem: Deployment update started but stuck, some pods old version, some new.
Investigation workflow:
# Step 1: Check rollout status
kubectl rollout status deployment/<deployment-name>
# Step 2: Check ReplicaSets
kubectl get rs
# Should see old RS scaling down, new RS scaling up
# Step 3: Check new pods
kubectl get pods -l app=<app-label>
kubectl describe pod <new-pod-name>
# Step 4: Check deployment events
kubectl describe deployment <deployment-name>
# Step 5: Check rollout history
kubectl rollout history deployment/<deployment-name>
Common causes & fixes:
- New pods won't start: Image pull error, config error
- Fix: Check pod logs and events, fix the issue
kubectl describe pod <new-pod>
# Events show: ImagePullBackOff or CrashLoopBackOff
# Fix image name if wrong
kubectl set image deployment/<n> <container>=<correct-image>
- Insufficient resources: Can't schedule new pods
- Fix: Scale down or add resources
# Check available resources
kubectl top nodes
# Either scale down old pods or add nodes
- Readiness probe failing: New pods never become ready
- Fix: Check readiness probe configuration
kubectl describe pod <new-pod> | grep -A 10 Readiness
kubectl logs <new-pod>
# Fix application or probe configuration
- Need to rollback: New version has bugs
- Fix: Rollback to previous version
kubectl rollout undo deployment/<deployment-name>
kubectl rollout status deployment/<deployment-name>
Scenario 6: PVC Stuck in Pending
Problem: PersistentVolumeClaim created but stuck in Pending, won't bind to PV.
Investigation workflow:
# Step 1: Check PVC status
kubectl get pvc
kubectl describe pvc <pvc-name>
# Step 2: Check available PVs
kubectl get pv
# Step 3: Check StorageClass
kubectl get sc
kubectl describe sc <sc-name>
# Step 4: Check provisioner pods
kubectl get pods -n kube-system | grep provisioner
kubectl logs -n kube-system <provisioner-pod>
# Step 5: Check events
kubectl describe pvc <pvc-name> | grep -A 10 Events
Common causes & fixes:
- No StorageClass: PVC doesn't specify SC and no default exists
Fix: Set default StorageClass or specify in PVC
# Set default SC
kubectl patch sc <sc-name> -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
# Or specify in PVC
kubectl edit pvc <pvc-name>
# Add: storageClassName: <sc-name>
- Dynamic provisioner not running: No provisioner to create PV
Fix: Deploy storage provisioner
kubectl get pods -n kube-system | grep provisioner
# If missing, install provisioner for your storage backend
- Access mode mismatch: PVC wants RWX but PV only has RWO
Fix: Change PVC access mode or provide compatible PV
kubectl get pv -o custom-columns=NAME:.metadata.name,ACCESS:.spec.accessModes
# Ensure PV has compatible access mode
- Insufficient storage: PVC requests 100Gi but largest PV is 50Gi
Fix: Create larger PV or reduce PVC request
kubectl get pv -o custom-columns=NAME:.metadata.name,SIZE:.spec.capacity.storage
Scenario 7: DNS Not Working
Problem: Pods can't resolve service names (nslookup fails, curl by name fails but by IP works).
Investigation workflow:
# Step 1: Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Step 2: Check CoreDNS service
kubectl get svc -n kube-system kube-dns
# Step 3: Test DNS from a pod
kubectl run test --rm -it --image=busybox:1.28 -- nslookup kubernetes.default
# Step 4: Check pod's resolv.conf
kubectl exec -it <pod> -- cat /etc/resolv.conf
# Step 5: Check CoreDNS logs
kubectl logs -n kube-system <coredns-pod>
Common causes & fixes:
- CoreDNS pods not running: DNS service down
Fix: Check why CoreDNS crashed, fix and restart
kubectl describe pod -n kube-system <coredns-pod>
kubectl delete pod -n kube-system <coredns-pod>
# Wait for new pod to start
- Wrong nameserver in resolv.conf: Pod not pointing to CoreDNS
Fix: Usually indicates kubelet issue, check kubelet config
# On node, check kubelet configuration
cat /var/lib/kubelet/config.yaml | grep clusterDNS
# Should match CoreDNS service ClusterIP
- Network policy blocking DNS: Policy prevents DNS queries
Fix: Allow DNS (port 53) in NetworkPolicy
kubectl get networkpolicy -A
# Ensure policies allow DNS to kube-dns service
- CoreDNS ConfigMap misconfigured: Wrong upstream DNS servers
Fix: Check and fix CoreDNS ConfigMap
# Verify upstream DNS servers are correct
kubectl get cm coredns -n kube-system -o yaml
Common Verification Patterns
After creating resources, always verify systematically:
These are your "muscle memory" verification commands. After creating any resource, run the appropriate verification automatically.
# Pods
kubectl get pods # STATUS=Running
# Deployments
kubectl get deployment # READY=X/X
# Services
kubectl get svc,endpoints # Endpoints exist
# PVC
kubectl get pvc # STATUS=Bound
# ConfigMap/Secret
kubectl get cm,secret
kubectl describe cm <n>
What you're checking:
- Pods: STATUS should be Running, not Pending/CrashLoopBackOff/Error
- Deployments: READY should match desired count (3/3), not less (2/3)
- Services: Endpoints should list pod IPs, not be empty
- PVCs: STATUS should be Bound within 10 seconds, not stuck Pending
- ConfigMaps/Secrets: They exist and have the expected keys
Troubleshooting workflow:
When something doesn't work, follow this sequence every time:
kubectl get nodes
kubectl get pods -n kube-system
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl logs <pod>
kubectl describe <resource>
Why this order:
- Nodes not Ready = cluster-level problem, fix this first
- System pods not running = control plane broken, everything will fail
- Events show recent errors = tells you what just broke
- Logs show application errors = most specific diagnostic info
- Describe shows configuration and recent events = comprehensive view
This workflow diagnoses 90% of issues in under 2 minutes.
Quick Reference Card
These are the commands you'll use often during the exam.
# Context (Always First!)
kubectl config use-context <context>
kubectl config current-context # Verify!
# Quick checks
k get nodes
k get po -A
k get svc,ep
k get pv,pvc
k get events --sort-by=.metadata.creationTimestamp
# Describe & logs
k describe <resource> <name>
k logs <pod>
k logs <pod> --previous # For crashes
# Exec into pod
k exec -it <pod> -- <command>
k exec -it <pod> -- /bin/sh
# Troubleshooting
k debug node/<node> -it --image=ubuntu
k top nodes
k top pods
# RBAC
k auth can-i <verb> <resource>
k auth can-i <verb> <resource> --as <user>
# Deployment operations
k rollout status deployment/<n>
k rollout history deployment/<n>
k rollout undo deployment/<n>
k scale deployment/<n> --replicas=<n>
k set image deployment/<n> <container>=<image>
# Storage
k get sc
k get pvc --watch
k describe pvc <n>
# Network debugging
k run test --rm -it --image=nicolaka/netshoot -- bash
k exec -it <pod> -- nslookup <service>
# etcd backup (MEMORIZE certificate paths!)
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd.db \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt
# Generate YAML templates
k run <pod> --image=<image> $do > pod.yaml
k create deploy <n> --image=<image> --replicas=<n> $do > deploy.yaml
k expose deploy <n> --port=<port> $do > svc.yaml
k create cm <n> --from-literal=k=v $do > cm.yaml
k create secret generic <n> --from-literal=k=v $do > secret.yaml
Time-saving aliases recap:
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgs='kubectl get svc'
alias kd='kubectl describe'
export do='--dry-run=client -o yaml'
export now='--force --grace-period=0'
Success Checklist
You're ready for the exam when you can consistently:
✓ Complete 15 questions in under 90 - 100 minutes
Practice with timer. If you can't finish in under 90 to 100 minutes, you're not fast enough yet.
✓ Score 90%+ on practice exams
The real exam is harder and more stressful. If you can't score 95% in practice, you might not pass the real thing.
✓ Generate YAML templates instantly
No hesitation, no looking up commands. kubectl run nginx --image=nginx $do > pod.yaml should be automatic.
✓ Use kubectl explain without hesitation
When you need field syntax, your first thought should be kubectl explain, not "let me search docs."
✓ Troubleshoot pods in < 2 minutes
See a failing pod, diagnose the root cause in under 2 minutes using describe, logs, and events.
✓ Switch contexts without errors
This should be second nature - copy command, run it, verify with current-context.
✓ Verify solutions systematically
After creating any resource, automatically run the verification commands without thinking.
Practice indicators you're ready:
- You don't need to look up basic kubectl commands
- You can troubleshoot from memory, not notes
- You complete mock exams with 20-30 minutes to spare
- You catch your own mistakes during verification
- You feel confident, not stressed
You're NOT ready if:
- You need notes for basic commands
- You forget to switch contexts frequently
- You spend > 10 minutes on single questions
- You don't finish practice exams on time
- You score <80% on practice exams
Final Preparation Tips
One week before exam:
- Do full mock exams daily - Killer.sh, KodeKloud
- Review this guide completely
- Focus on weak areas
- Memorize verification commands
- Practice aliases until automatic
- Review common failure scenarios - Pod CrashLoopBackOff, PVC Pending, Service no endpoints
- Time yourself strictly - Use actual exam timing
Day before exam:
- Light review only - Don't cram, you'll just stress yourself
- Practice environment setup - Aliases, vim config, completion (do this 5 times)
- Review troubleshooting workflows - The systematic node → component → pod approach
- Get good sleep - 7-8 hours minimum. Tired = mistakes = failure
- Prepare your workspace - Quiet room, good internet, backup internet
- Test your setup - Webcam, microphone, screen sharing
Exam day:
- Arrive 15 minutes early - Don't rush, be calm
- Do the environment check calmly - Don't panic during setup
- Set up environment first 5 minutes - Aliases, vim, completion
- Follow your workflow - Don't improvise under pressure
- Trust your preparation - You've done this 100 times in practice
- Keep moving forward - Don't get stuck on hard questions
- Verify every solution - 30 seconds of checking = points
- Use all available time - If you finish early, review flagged questions
During the exam:
- Take a deep breath every 5 questions
- If you're stuck, move on immediately
- Remember: 66% to pass, you don't need perfection
- Every question you verify correctly is points in the bank
- The exam environment is stressful - this is normal
Common Mistakes to Avoid
These mistakes cost candidates the most points:
❌ Forgetting to switch context
- Lost points even with correct solution
- Always copy-paste context command from question
- Always verify with kubectl config current-context
❌ Not verifying solutions
- Created deployment but it's not running
- Exposed service but no endpoints
- 30 seconds of verification = difference between 0 and full points
❌ Writing YAML from scratch
- Takes 5-8 minutes per resource
- High chance of syntax errors
- Use --dry-run=client -o yaml instead
❌ Spending > 7 to 8 minutes on one question
- Time is your enemy
- Flag and move on
- Come back if time permits
❌ Not using kubectl explain
- Wastes time searching documentation
- kubectl explain gives answer in 30 seconds
- Practice using it until it's your first instinct
❌ Typing resource names manually
- Typos cost time and points
- Copy-paste from question
- Use tab completion
❌ Not checking logs for crashes
- Pod in CrashLoopBackOff but didn't check --previous logs
- Logs tell you exactly why it crashed
- Always check logs when troubleshooting
❌ Ignoring the Events section
- Events show what just failed
- kubectl describe Events section is gold
- Shows "FailedScheduling", "FailedMount", "ImagePullBackOff" reasons
You've Got This!
Remember:
- The CKA exam is challenging but passable with proper preparation
- Speed comes from practice - Complete as many tasks as possible before appearing for the exam
- Verification is not optional - it's how you guarantee points
- Imperative commands save you 20-30 minutes
- Systematic troubleshooting beats random guessing every time
The exam tests two things:
- Do you know Kubernetes? (Technical knowledge)
- Can you work fast under pressure? (Time management)
You need both. Study the concepts but also practice the speed techniques.
Your preparation checklist:
- ✓ Read this guide completely
- ✓ Practice all commands until automatic
- ✓ Do all the mock exams in KodeKloud
- ✓ Score consistently >90% on practice
- ✓ Complete practice exams in <100 minutes
- ✓ Memorize verification patterns
- ✓ Practice troubleshooting workflows
- ✓ Set up aliases and test them repeatedly
Practice with KodeKloud’s Free Kubernetes Labs

The all-time most recommended CKA course

🎯 Good luck on your CKA certification! 🎯
FAQ
If I’ve never used aliases, should I set them up on exam day?
No. Only if you’ve practiced them for 1-2 weeks. Otherwise use full kubectl
+ autocomplete.
Why is k
fine in my terminal but breaks in scripts?
k
is a shell alias (interactive only). Files and scripts need full kubectl
.
Is $do
safe in files?
Yes - environment variables expand before redirection, so $do
works in saved commands.
What’s the quickest editor tweak that saves errors?
expandtab
, tabstop=2
, shiftwidth=2
in .vimrc
to avoid TABs in YAML.
How much time can setup save if I’m practiced?
~10–20 minutes across the exam.
I created something - what’s the one-line "did it work?" check?
kubectl get <kind> -o wide
and immediately skim STATUS/READY/NODE/IP.
What’s the fastest way to confirm every requirement in a task?
Re-read the prompt and tick off each requirement with describe
/get
outputs.
If a resource “exists” but isn’t functional, what did I likely skip?
Endpoints (for Services), Bound
(for PVC), rollout status (for Deployments), or Events.
When should I not generate YAML via --dry-run
?
When the spec is trivial (simple pod/service). Imperative may be faster.
Can kubectl explain
replace doc browsing for field syntax?
Mostly yes - especially for probe/env/volume fields you forget.
My dry-run YAML lacks a niche field - what now?
Use kubectl explain <path>
to locate the exact field and edit the YAML.
PVC stuck in Pending - what’s the 10-second triage?
get sc
(default?), describe pvc
(Events), get pv
(size/mode), check provisioner pod in kube-system
.
CrashLoopBackOff but logs look empty - now what?
Use kubectl logs <pod> --previous
to see the crashed container’s logs (golden ticket).
Rollout feels stuck - minimal sequence?
rollout status
→ get rs
→ describe pod
(Events) → logs
. Fix image/pod issues or rollback.
What’s the biggest zero-point mistake?
Forgetting to switch context. Always kubectl config use-context …
then verify with current-context
.
When should I not write YAML from scratch?
Almost always. Generate via --dry-run=client -o yaml
, then edit the few lines you need.
HPA won’t scale - two common misses?
Metrics server not running, or pods missing CPU requests.
Discussion