CKA Exam Verification Guide

by Pramodh Kumar M
Pramodh Kumar M
- LinkedIn
Nimesha Jinarajadasa
Nimesha Jianrajadasa is a DevOps & Cloud Consultant, K8s expert, and instructional content strategist-crafting hands-on learning experiences in DevOps, Kubernetes, and platform engineering.
- LinkedIn
Nimesha Jinarajadasa
•
October 19, 2025
•
44 min read

Join 1M+ Learners

Learn & Practice DevOps, Cloud, AI, and Much More — All Through Hands-On, Interactive Labs!

Create Your Free Account BLACK FRIDAY SALE: Up to 50% OFF* On Annual Plans *terms and conditions apply

Highlights

Focus: Master verification & speed - 16-17 tasks in 120 mins = ~7 mins each.
Setup: Use aliases (k, $do, $now) only if you’ve practiced them.
YAML Tip: Generate with --dry-run=client -o yaml; never write from scratch.
Docs Shortcut: Use kubectl explain, not web docs.
Verify Everything: Always get, describe, logs, and check Events.
Storage: Default StorageClass + PV/PVC binding must work.
Troubleshooting: Start from nodes → system pods → workloads.
RBAC & Security: Test permissions with kubectl auth can-i.
Upgrade Flow: Control plane first, then kubelet restart.
Networking: Endpoints tell truth; check DNS/CoreDNS if service fails.
Time Rule: Max 7 to 8 mins per question; verify before moving on.
Golden Trick: Imperative → YAML → Apply → Verify = full marks.
Practice Goal: Score 90%+ in mock exams, finish in <100 mins.
Mindset: Speed + verification = success in CKA.

Welcome! You're about to dive into a comprehensive guide that covers commands which can be leveraged as a verification method for the CKA exam.

Here's the reality: you have 120 minutes for around 16-17 questions, which means about 6-7 minutes per task. Sounds tight? It is! But master these verification techniques, and you'll be confident in finishing with time to spare.

Think of this guide as your exam companion. Each section is designed to be practical and focused on what actually matters during those 2 hours.

Exam Setup - The Speed Booster (Optional, But Highly Recommended!)

Should you set this up?

Honestly, it's up to you! Some folks love aliases and can't imagine working without them. Others prefer typing full commands every time. Here's my take: if you practice with these shortcuts for 2-3 weeks before the exam, they become second nature and can save you 10-20 minutes total. That's huge! But if you try them for the first time on exam day, they'll just confuse you and slow you down.

Good news about autocomplete:

The exam environment already has kubectl autocomplete enabled! So, when you type kubectl get po and hit TAB, it auto-completes to kubectl get pods. This works out of the box - no setup needed. Pretty sweet, right?

The time-saving aliases:

Let me be real with you - typing kubectl many a times during the exam gets old fast. That's why I use k as an alias. Same with $do for --dry-run=client -o yaml - typing that several times is mind-numbing. But here's the deal: practice with these for at least 2 weeks before the exam, or don't use them at all. Half-learned shortcuts will trip you up under pressure.

cat >> ~/.bashrc << 'EOF'
alias k='kubectl'
alias kgp='kubectl get pods'
alias kgs='kubectl get svc'
alias kd='kubectl describe'
export do='--dry-run=client -o yaml'
export now='--force --grace-period=0'
EOF

source ~/.bashrc

Set up aliases

Why these specific aliases?

k - Because typing "kubectl" 200 times hurts
kgp and kgs - Your most-used commands deserve shortcuts
$do - This magical variable generates YAML templates instantly
$now - Deletes pods immediately without 30-second grace period

Configure vim for YAML editing:

YAML is whitespace-sensitive (2 spaces for indentation, never tabs). Vim's defaults will drive you crazy if you don't fix them:

cat >> ~/.vimrc << 'EOF'
set number
set tabstop=2
set shiftwidth=2
set expandtab
EOF

YAML editing configuration

Now here's something CRITICAL: When you save commands to files (like generating YAML), you MUST use the full kubectl command, not the k alias. Why? Because aliases only work in your interactive shell, not in files. So:

✅ Correct: kubectl run nginx --image=nginx $do > pod.yaml

❌ Wrong: k run nginx --image=nginx $do > pod.yaml (won't work during exam verification)

The $do variable works in both because it's an environment variable, but k is a shell alias and only works when you type it directly.

Author's recommendation:

Spend your first 2-3 minutes of the exam setting these up if you've practiced with them. If you haven't practiced, skip them entirely and just use full commands with autocomplete. A familiar workflow beats a faster unfamiliar one every single time.

1. Storage (10% of the exam)

This is where things get interesting. Storage questions separate the people who've actually worked with Kubernetes from those who just read about it. You'll be verifying that StorageClasses work, PVs and PVCs bind properly, and most importantly - that dynamic provisioning actually creates storage when you need it.

Here's what trips people up: they create a PVC, see it stuck in Pending, panic, and waste 10 minutes troubleshooting. By the end of this section, you'll diagnose that in 30 seconds flat.

StorageClass Verification

What you're really checking: Think of StorageClasses as templates for creating storage. The big questions are: Which one is default? What happens to data when I delete a PVC (Retain or Delete)? And when does the volume get created (Immediate or WaitForFirstConsumer)?

These aren't just academic questions - they determine whether your application's data survives or gets wiped out!

# List storage classes
kubectl get sc

# Check default storage class
kubectl get sc | grep "(default)"

# Describe storage class details
kubectl describe sc <sc-name>

# Get specific fields
kubectl get sc <sc-name> -o jsonpath='{.reclaimPolicy}'
kubectl get sc <sc-name> -o jsonpath='{.volumeBindingMode}'

If there's no default StorageClass and you create a PVC without specifying one, congratulations - you've just created a PVC that will sit in Pending status forever.

Dynamic Provisioning Test

The real test: Dynamic provisioning is Kubernetes automatically creating storage volumes for you. When you create a PVC, Kubernetes should automatically provision a PV to satisfy it. This is the #1 storage failure point in the exam - you need to verify the entire chain works.

# Watch PVC bind
kubectl get pvc test-pvc --watch
# Expected: Pending → Bound (5-10 seconds)

# Verify PV created
kubectl get pv

# Check provisioning events
kubectl describe pvc test-pvc | grep -A 10 Events

# Troubleshoot stuck PVC
kubectl get sc <sc-name>                            # Verify SC exists
kubectl get pods -n kube-system | grep provisioner  # Check provisioner running
kubectl logs -n kube-system <provisioner-pod>       # Check logs

Critical insight: If a PVC stays Pending for more than 10 seconds, something is wrong. The --watch flag is essential here - it shows you the status changing in real-time so you immediately know when it succeeds or fails. The Events section tells you exactly what went wrong.

PV/PVC Binding

What's happening: PVs (PersistentVolumes) are cluster resources, PVCs (PersistentVolumeClaims) are requests for storage. They must "bind" to each other based on matching criteria (size, access mode, StorageClass). Think of PV as the actual hard drive, and PVC as the application saying "I need 10GB of storage."

# View together
kubectl get pv,pvc

# Check binding details
kubectl describe pvc <pvc-name>
# Look for: Status=Bound, Volume=<pv-name>, Events=SuccessfulBinding

# Bidirectional verification
kubectl get pvc <pvc-name> -o jsonpath='{.spec.volumeName}'
kubectl get pv <pv-name> -o jsonpath='{.spec.claimRef.name}'

# Make Released PV available again
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'

Scenario: You delete a PVC, and the PV goes into "Released" state. The scenario is that you to make that PV available again for binding. The patch command clears the claim reference, making it Available again.

Access Modes & Reclaim Policies

Why this matters: Access modes determine how many pods can use the volume simultaneously. RWO (ReadWriteOnce) means only one pod can write - if you try to schedule two pods with RWO volumes on different nodes, one will fail. RWX (ReadWriteMany) allows multiple pods to write simultaneously. Mismatched access modes are a frequent trap.

# View access modes
kubectl get pv -o custom-columns=NAME:.metadata.name,ACCESS:.spec.accessModes

# View reclaim policies
kubectl get pv -o custom-columns=NAME:.metadata.name,RECLAIM:.spec.persistentVolumeReclaimPolicy

# Change reclaim policy
kubectl patch pv <pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

The scenario may give you a PV with "Delete" reclaim policy and ask you to preserve data after PVC deletion. You MUST change it to "Retain" before deleting the PVC, or all data is lost. The patch command is the fastest way to change this.

2. Troubleshooting (30% of exam)

Troubleshooting is 30% of the exam, which means it's worth almost as much as ALL other domains combined. And here's the thing - you can't just memorize answers. Each troubleshooting question is a unique puzzle. A pod won't start, a service doesn't work, a node goes rogue - and you need to figure out why.

But here's the secret: 90% of troubleshooting follows the same pattern. Check the big stuff first (nodes, control plane), then drill down (pods, containers). It's like debugging anything else - you don't start by checking individual lines of code when the whole server might be down, right?

The mindset shift: Stop thinking "what's the answer?" and start thinking "what would I check first?" This systematic approach will serve you way better than trying to memorize every possible failure mode.

Cluster Health

Start here, always: Before you troubleshoot anything specific, answer this: "Is the cluster even healthy?" Because if the nodes are down or the API server is crashed, nothing else matters. You can't fix a pod on a broken cluster.

Think of it like troubleshooting your car - before you check why the radio isn't working, make sure the engine is running!

# Check nodes
kubectl get nodes
kubectl get nodes -o wide

# Check cluster components
kubectl get componentstatuses
kubectl get --raw='/readyz?verbose'
kubectl version

# View cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events -A --sort-by=.metadata.creationTimestamp

What you're looking for: All nodes showing "Ready" status. If you see "NotReady," stop everything - that's your problem right there. The kubectl version might seem too simple, but if the server version doesn't show up? Your API server is down, and that's why literally nothing is working.

Pro move: Events sorted by creation time show you what just broke. The most recent event is usually your culprit. I once spent 15 minutes troubleshooting a pod before checking events and seeing "ImagePullBackOff: image not found." Face, meet palm.

Node Diagnostics

When nodes fail: A "NotReady" node is a common scenario. The node could be out of resources (CPU, memory, disk), the kubelet could be down, or there could be network issues. The describe command gives you everything you need to diagnose this.

# Detailed node info
kubectl describe node <node-name>
# Check: Conditions (Ready, MemoryPressure, DiskPressure)

# Resource allocation
kubectl describe nodes | grep -A 10 "Allocated resources"

# Debug node with shell
kubectl debug node/<node-name> -it --image=ubuntu
# Node filesystem at /host

The Conditions section is gold: It tells you exactly what's wrong - MemoryPressure=True means the node is out of memory, DiskPressure=True means disk is full. The "Allocated resources" section shows if pods are requesting more resources than the node has available. The debug command gives you shell access to the node even if SSH is broken - this is a game-changer for node troubleshooting.

Control Plane Components

Why this matters: The control plane (API server, scheduler, controller manager, etcd) runs the entire cluster. If any component is down, specific functions break: API server down = can't run kubectl commands, scheduler down = pods stuck Pending, controller manager down = no deployments/replicasets work, etcd down = no persistent state.

# Check control plane pods
kubectl get pods -n kube-system
kubectl get pods -n kube-system -l component=kube-apiserver
kubectl get pods -n kube-system -l component=kube-scheduler

# Check logs
kubectl logs -n kube-system <component-pod>
journalctl -u kubelet -n 100

# Verify kubelet (on node)
systemctl status kubelet
journalctl -u kubelet -f

Scenario: If all pods are stuck Pending, check the scheduler. If deployments aren't creating pods, check the controller manager. If nothing works, check the API server. The logs tell you the exact error - certificate expired, can't connect to etcd, port already in use, etc.

etcd Health

Critical understanding: etcd is the database for everything in Kubernetes. If etcd is unhealthy, the entire cluster is unstable. In HA clusters, you need at least (n/2)+1 members healthy (quorum). With 3 members, you can lose 1. With 5 members, you can lose 2.

# Health check
kubectl exec -n kube-system etcd-<node> -- etcdctl \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  endpoint health

# List members
kubectl exec -n kube-system etcd-<node> -- etcdctl \
  --cert=/etc/kubernetes/pki/etcd/peer.crt \
  --key=/etc/kubernetes/pki/etcd/peer.key \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  member list
 
# Via API server
kubectl get --raw=/livez/etcd
 
# Backup and verify
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
  --cert=<cert> --key=<key> --cacert=<cacert>

etcdutl --write-out=table snapshot status snapshot.db

etcd backup/restore questions are important. You MUST get the certificate paths exactly right or the command fails. The certificate paths are usually in /etc/kubernetes/pki/etcd/. The snapshot status command verifies the backup is valid before you try to restore it.

Resource Monitoring

What you're diagnosing: Pods can fail because nodes run out of CPU or memory. The metrics server provides this real-time data. Without metrics server, kubectl top won't work, and HPA (Horizontal Pod Autoscaler) won't function.

# Verify metrics server
kubectl get deployment metrics-server -n kube-system
 
# Node/pod resource usage
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=cpu
kubectl top pods --all-namespaces --sort-by=memory --containers
 
# Resource quotas
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota <n>

Practical use: If pods are stuck Pending with "Insufficient CPU" or "Insufficient memory" errors, kubectl top nodes shows you exactly which nodes are maxed out. Sorting by CPU or memory shows which pods are the resource hogs. This guides your troubleshooting - maybe you need to scale down other pods or add more nodes.

Container Logs

The truth is in the logs: When a pod crashes, the logs tell you why - missing environment variable, can't connect to database, out of memory, etc. Logs are your primary debugging tool for application issues.

# View logs
kubectl logs <pod>
kubectl logs <pod> -c <container>
kubectl logs <pod> -f --tail=50 --since=5m
kubectl logs <pod> --previous  # For CrashLoopBackOff
 
# Multi-container pods
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].name}'
kubectl logs <pod> --all-containers=true
 
# Filter logs
kubectl logs <pod> | grep -i "error\|warn"

Critical flag: --previous: When a pod is in CrashLoopBackOff, the current container hasn't started yet, so there are no logs. The --previous flag shows logs from the crashed container, which contains the actual error message. This is essential for debugging startup failures.

Service Troubleshooting

Common issue: "I can curl the pod directly, but the service doesn't work." This means the problem is with the service, not the pod. Usually it's a selector mismatch - the service selector doesn't match the pod labels.

# Check service and endpoints
kubectl get svc <service>
kubectl get endpoints <service>
 
# Verify selector
kubectl describe svc <service> | grep Selector
kubectl get pods --selector=<label-key>=<label-value>
 
# Debug pod with network tools
kubectl run test --rm -it --image=nicolaka/netshoot -- bash
# Then: nslookup <service>, curl <service>:<port>
 
# DNS check
kubectl exec -it <pod> -- cat /etc/resolv.conf
kubectl get pods -n kube-system -l k8s-app=kube-dns

The endpoints tell the truth: If kubectl get endpoints <service> returns empty, no pods match the service selector. Fix the labels. If endpoints exist but the service still doesn't work, it's usually DNS or network policy blocking traffic. The netshoot image has every network debugging tool you need - nslookup for DNS, curl for HTTP, ping for connectivity.

3. Workloads & Scheduling (15% of exam)

This section tests deployments, rolling updates, configuration management (ConfigMaps/Secrets), autoscaling, and pod scheduling. You need to verify deployments roll out correctly, configurations are injected properly, and pods are scheduled according to constraints like node affinity and taints.

Deployments

What deployments do: Deployments manage ReplicaSets, which manage Pods. When you update a deployment (change image, add env var, etc.), it creates a new ReplicaSet and gradually shifts traffic from old to new. This is a rolling update - zero downtime.

# Check status
kubectl get deployment <n>
kubectl rollout status deployment/<n>
 
# Watch ReplicaSets
kubectl get rs --watch
 
# Revision history
kubectl rollout history deployment/<n>
kubectl rollout history deployment/<n> --revision=2

What READY means: "3/3" means 3 pods are ready out of 3 desired. "2/3" means one pod is still starting or failing. The rollout status command streams updates in real-time - "Waiting for rollout to finish: 2 out of 3 new replicas have been updated" - so you know exactly what's happening. The revision history is crucial for rollbacks.

Rolling Updates & Rollbacks

Scenario: "Update the deployment to use nginx:1.16, verify the rollout succeeds, then roll back to the previous version."

# Update image
kubectl set image deployment/<n> <container>=<new-image>
 
# Check strategy
kubectl get deployment <n> -o yaml | grep -A 5 strategy
 
# Rollback
kubectl rollout undo deployment/<n>
kubectl rollout undo deployment/<n> --to-revision=2

Rolling update strategy: maxUnavailable and maxSurge control how the update happens. maxUnavailable=1 means at most 1 pod can be down during the update. maxSurge=1 means you can temporarily have 1 extra pod above the desired count. Understanding this helps you troubleshoot stuck rollouts.

ConfigMaps & Secrets

Why they exist: Hard-coding config into container images is bad practice. ConfigMaps hold non-sensitive config (database URL, feature flags), Secrets hold sensitive data (passwords, API keys). You inject them as environment variables or files.

# View content
kubectl get configmap <n> -o yaml
kubectl get secret <n> -o yaml
 
# Decode secret
kubectl get secret <n> -o jsonpath='{.data.password}' | base64 -d
 
# Verify in pod
kubectl exec -it <pod> -- env | grep <VAR>
kubectl exec -it <pod> -- ls /etc/config
kubectl exec -it <pod> -- cat /etc/config/<key>

Secrets are base64-encoded, NOT encrypted. You must decode them to read the actual value. When troubleshooting "pod can't connect to database," check if the password secret is correct by decoding it. Verifying the config inside the pod confirms it was injected properly.

HPA (Horizontal Pod Autoscaler)

How it works: HPA watches CPU/memory metrics and scales pods up or down automatically. If CPU usage > 80% target, it scales up. If CPU < 80%, it scales down. Requires metrics-server to be running.

# Check HPA
kubectl get hpa
kubectl describe hpa <n>
kubectl get hpa --watch
 
# Verify metrics server
kubectl top pods
 
# Generate load
kubectl run load-gen --image=busybox --restart=Never -- \
  /bin/sh -c "while true; do wget -q -O- http://<svc>; done"

Scenario: Create an HPA that scales between 2-10 replicas based on 70% CPU utilization. After creating it, you MUST verify it works by generating load and watching the replica count increase. The --watch flag shows this in real-time. If it doesn't scale, check if metrics-server is running and if resource requests are defined on the pods.

Probes

Critical difference: Liveness probes determine if a container is alive - if it fails, Kubernetes restarts the container. Readiness probes determine if a container is ready to receive traffic - if it fails, Kubernetes removes the pod from service endpoints but doesn't restart it.

# Check configuration
kubectl describe pod <pod> | grep -A 10 Liveness
kubectl describe pod <pod> | grep -A 10 Readiness
 
# Check failures
kubectl describe pod <pod> | grep -A 20 Events
kubectl get pod <pod> -o jsonpath='{.status.containerStatuses[0].restartCount}'
 
# Verify endpoints (readiness)
kubectl get endpoints <service>

Why this matters: If a pod is restarting frequently (high restart count), check the liveness probe - maybe it's too aggressive (checks every 1 second, timeout 1 second). If a service has no traffic going to some pods, check readiness probe failures - those pods are removed from the service endpoints.

Resource Limits

Requests are what Kubernetes uses for scheduling - "this pod needs at least 100m CPU." Limits are the maximum the pod can use - "this pod can use up to 500m CPU." QoS (Quality of Service) determines which pods get evicted first when the node runs out of resources.

# Check limits
kubectl describe pod <pod> | grep -A 10 Limits
 
# Check QoS
kubectl get pod <pod> -o jsonpath='{.status.qosClass}'
 
# Node allocation
kubectl describe nodes | grep -A 10 "Allocated resources"

QoS classes: Guaranteed (requests=limits) are evicted last. BestEffort (no requests/limits) are evicted first. Burstable (requests < limits) are in between. If your critical pods keep getting evicted, they need higher QoS - set requests=limits to make them Guaranteed.

Node Affinity & Taints

The scheduling story: Not all pods can run on all nodes. Node affinity is the pod saying "I prefer/require nodes with label X." Taints are the node saying "Don't schedule pods here unless they tolerate my taint." Together they control where pods land.

# Check node labels
kubectl get nodes --show-labels
 
# Check taints
kubectl describe node <node> | grep Taints
 
# Check tolerations
kubectl get pod <pod> -o yaml | grep -A 10 tolerations
 
# Verify scheduling
kubectl get pod <pod> -o wide

Scenario: "Schedule this pod only on nodes with label disk=ssd." You add nodeSelector or nodeAffinity to the pod spec. To verify, kubectl get pod <pod> -o wide shows which node it's on, then kubectl get node <node> --show-labels confirms that node has disk=ssd. For taints, if a node is tainted with key=value:NoSchedule, only pods with a matching toleration can schedule there.

Cluster Architecture (25% of exam)

Overview: This is the deepest technical section - RBAC security, cluster lifecycle management with kubeadm, HA configurations, package management with Helm, and extension points (CNI, CSI, CRI, CRDs). These are advanced admin topics that separate CKA from easier certifications.

RBAC

Security model: RBAC controls who can do what in the cluster. Users/ServiceAccounts (who) get permissions through Roles/ClusterRoles (what actions) via RoleBindings/ClusterRoleBindings (the assignment). Without proper RBAC, users either can't do their job or have too much access.

# Check permissions
kubectl auth can-i create pods
kubectl auth can-i list secrets --as user1 -n kube-system
kubectl auth can-i --list
 
# View roles and bindings
kubectl get role,rolebinding -n <namespace>
kubectl get clusterrole,clusterrolebinding
kubectl describe rolebinding <n>
 
# Test service account
kubectl auth can-i get pods --as system:serviceaccount:<ns>:<sa>

Task: "Create a Role that allows reading pods and services in namespace 'app', bind it to user 'dev'." After creating it, you MUST verify with kubectl auth can-i get pods --as dev -n app returning "yes". The --as flag lets you impersonate users to test permissions without switching credentials.

Infrastructure Preparation

Before kubeadm: You can't just run kubeadm init on a fresh server and expect it to work. The underlying infrastructure must meet specific requirements. You may have to verify prerequisites or troubleshoot a failed cluster initialization due to missing requirements.

System requirements to verify:

# Check Linux kernel version (must be 3.10+)
uname -r
 
# Verify required ports are available
# Control plane: 6443, 2379-2380, 10250-10252
# Worker nodes: 10250, 30000-32767
netstat -tuln | grep -E '6443|2379|2380|10250|10251|10252'
 
# Check if swap is disabled (REQUIRED)
free -h
# or
swapon --show
 
# Disable swap if enabled
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
 
# Verify container runtime installed
systemctl status containerd
# or
systemctl status docker
 
# Check if br_netfilter module is loaded
lsmod | grep br_netfilter
sudo modprobe br_netfilter
 
# Enable IP forwarding
cat /proc/sys/net/ipv4/ip_forward  # Should return 1
sudo sysctl -w net.ipv4.ip_forward=1
sudo sysctl -w net.bridge.bridge-nf-call-iptables=1
 
# Make persistent
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system

Why these checks matter: Swap disabled is mandatory - kubeadm will refuse to initialize if swap is on. The br_netfilter module is required for iptables to see bridged traffic (pod networking won't work without it). IP forwarding must be enabled for packets to route between pods on different nodes.

Package verification:

# Verify kubeadm, kubelet, kubectl installed
kubeadm version
kubelet --version
kubectl version --client
 
# Check kubelet is enabled (but not yet running before init)
systemctl is-enabled kubelet
systemctl status kubelet

Network prerequisites:

# Verify unique hostname per node
hostname
hostnamectl
 
# Check unique MAC addresses
ip link show | grep ether
 
# Verify DNS resolution works
nslookup google.com
ping -c 2 8.8.8.8
 
# Check firewall status (may need to allow Kubernetes ports)
sudo systemctl status firewalld
# or
sudo ufw status

Scenario: The cluster initialization failed. Verify the prerequisites and fix any issues. You'd check: (1) swap is off, (2) required ports are available, (3) container runtime is running, (4) br_netfilter module is loaded, (5) IP forwarding is enabled. Fix what's broken, then retry kubeadm init.

Pre-flight checks:

# kubeadm has built-in checks
sudo kubeadm init --dry-run
 
# This shows what would happen without actually initializing
# Lists all prerequisite checks and their status

Common initialization blockers:

Swap is on → swapoff -a
Port 6443 already in use → Check what's using it: lsof -i :6443
Container runtime not running → systemctl start containerd
br_netfilter not loaded → modprobe br_netfilter
Firewall blocking ports → Configure firewall rules

kubeadm Cluster

Foundation knowledge: kubeadm is the standard tool for bootstrapping Kubernetes clusters. The exam assumes you understand the cluster initialization process, certificate management, and adding nodes.

# Verify initialization
kubectl cluster-info
kubectl get pods -n kube-system
 
# Check certificates
kubeadm certs check-expiration
 
# Generate join token
kubeadm token create --print-join-command
kubeadm token list

Kubernetes certificates expire after 1 year by default. If certificates expire, the cluster becomes inoperable. The kubeadm certs check-expiration command is your early warning system - if any cert expires in < 90 days, you need to renew it. Join tokens expire in 24 hours, so if you need to add a node later, you generate a new token.

Cluster Upgrade

You must upgrade control plane nodes first, then worker nodes, and never skip minor versions (can't go from 1.27 to 1.29 directly).

# Check version
kubectl version --short
 
# Plan upgrade
kubeadm upgrade plan
 
# After kubeadm upgrade apply
kubectl get nodes  # Still shows old version
apt-get install kubelet=1.29.0-00 kubectl=1.29.0-00
systemctl daemon-reload && systemctl restart kubelet
kubectl get nodes  # Now shows new version

Critical sequence: kubeadm upgrade apply upgrades control plane components, but NOT kubelet. That's why kubectl get nodes still shows the old version - it reports kubelet version, not control plane version. You must manually upgrade kubelet packages and restart the service on each node. Missing this step is the #1 upgrade mistake.

High Availability

Production requirement: HA means multiple control plane nodes so if one fails, the cluster keeps running. You need odd numbers (3, 5, 7) of etcd members for quorum. The load balancer distributes API requests across multiple API servers.

# Check control plane nodes
kubectl get nodes --selector=node-role.kubernetes.io/control-plane
 
# Component distribution
kubectl get pods -n kube-system -o wide | grep -E 'etcd|apiserver'
 
# Test load balancer
curl -k https://<lb-endpoint>:6443/healthz

Verification matters: In a 3-node HA cluster, you should see 3 control plane nodes, 3 etcd pods, 3 API server pods. If the load balancer is working, curling it should return "ok." If one control plane node goes down, the cluster should still function - test this during verification.

Helm & Kustomize

Package management: Helm is like apt/yum for Kubernetes - it packages multiple resources into a "chart" for easy installation. Kustomize customizes YAML without templating. Both are standard tools for managing complex applications.

# Helm
helm list -A
helm status <release>
helm history <release>
helm get manifest <release>
helm install <release> <chart> --dry-run --debug
 
# Kustomize
kubectl kustomize ./
kubectl apply -k ./

Scenario: Install nginx-ingress using Helm, verify it's running. After helm install, use helm status to check deployment status, helm get manifest to see what resources were created, and kubectl get pods -n <namespace> to verify pods are running. The --dry-run --debug flags let you preview what will be installed without actually installing it.

Extension Interfaces

Advanced concept: Kubernetes is extensible through plugin interfaces. CNI (Container Network Interface) provides networking, CSI (Container Storage Interface) provides storage, CRI (Container Runtime Interface) provides container runtime. CRDs (Custom Resource Definitions) let you extend Kubernetes with custom resource types.

# CNI
kubectl get pods -n kube-system | grep -E 'calico|flannel|weave'
 
# CSI
kubectl get csidrivers
kubectl get storageclass
 
# CRI
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.containerRuntimeVersion}{"\n"}{end}'
 
# CRD
kubectl get crd
kubectl describe crd <n>
kubectl explain <custom-resource-type>

Why verify these: If no CNI is running, pods can't communicate. If CSI drivers are missing, dynamic provisioning won't work. Checking the container runtime confirms whether you're using Docker, containerd, or CRI-O. CRDs let you use kubectl with custom resources like Prometheus, Istio, or Operators.

Services & Networking (20% of exam)

Overview: Networking is how pods communicate with each other and the outside world. You need to understand Services (load balancing), NetworkPolicies (firewalls), Ingress (HTTP routing), and DNS. Networking failures are subtle - services may be created but not working due to selector mismatches or DNS issues.

Pod Connectivity

Foundation: Before troubleshooting services, verify basic pod-to-pod connectivity works. If pods can't ping each other, the CNI plugin is broken and nothing will work.

# Get pod IPs
kubectl get pods -o wide
 
# Test connectivity
kubectl exec -it <pod> -- ping <target-ip>
kubectl exec -it <pod> -- curl <target-ip>:<port>
 
# Debug pod
kubectl run netshoot --rm -it --image=nicolaka/netshoot -- bash
 
# Test service DNS
kubectl exec -it <pod> -- curl <service>:<port>
kubectl exec -it <pod> -- curl <service>.<ns>.svc.cluster.local:<port>

Debugging technique: The netshoot image is your network Swiss Army knife - it has curl, wget, nslookup, dig, ping, traceroute, netstat, and more. When troubleshooting network issues, always start with a netshoot pod. If curl <service> works but your application can't connect, the problem is with your application, not Kubernetes networking.

Service Validation

How services work: Services provide a stable endpoint (ClusterIP) that load balances to a set of pods. The service finds pods using a selector (label query). If no pods match the selector, the service has no endpoints and doesn't work.

# Check service
kubectl get svc
kubectl describe svc <service>
 
# Check endpoints
kubectl get endpoints <service>
 
# Verify selector
kubectl describe svc <service> | grep Selector
kubectl get pods --selector=<label>=<value>

The endpoints are the truth: If kubectl get endpoints <service> shows IPs, those are the pod IPs that will receive traffic. If it's empty, the selector doesn't match any pods - check labels.

Common mistake: service selector is "app=web" but pod labels are "app=webapp."

Service Types

Three types you must know: ClusterIP (default) is only accessible within the cluster. NodePort opens a port on every node (30000-32767) for external access. LoadBalancer provisions a cloud load balancer (only works on cloud providers like AWS/GCP/Azure).

# ClusterIP (default)
kubectl get svc <service>  # TYPE=ClusterIP
kubectl run test --rm -it --image=busybox:1.28 -- wget -O- <cluster-ip>:<port>
 
# NodePort
kubectl get svc <service>  # PORT(S)=80:30007/TCP
curl <node-ip>:30007
 
# LoadBalancer
kubectl get svc <service>  # EXTERNAL-IP populated
curl <external-ip>:<port>

Testing access: For ClusterIP, you must test from inside the cluster (hence the busybox pod). For NodePort, you can test from outside using any node's IP. For LoadBalancer, you use the external IP. Suppose if the question is "expose this deployment externally" - choose NodePort if no cloud provider, LoadBalancer if cloud provider is available.

Network Policies

Kubernetes firewall: By default, all pods can talk to all pods. NetworkPolicies restrict this - "only pods with label X can connect to pods with label Y on port 80." This is pod-level firewall rules.

# List policies
kubectl get networkpolicy -A
 
# Describe policy
kubectl describe netpol <n>
 
# Test enforcement
kubectl exec <pod> -- curl --max-time 5 <target>  # Should timeout if blocked
kubectl exec <pod> -- curl <target>                # Should work if allowed
 
# Verify CNI support
kubectl get pods -n kube-system | grep -E 'calico|cilium|weave'

NetworkPolicies only work if your CNI plugin supports them. Calico, Cilium, and Weave do. Flannel does NOT. If you're using Flannel and create NetworkPolicies, they'll be created successfully but have no effect. Always verify CNI support first. Testing is essential - create a policy, then verify it actually blocks traffic as expected.

Ingress

HTTP routing: Ingress routes HTTP/HTTPS traffic to services based on hostname or path. "example.com/api → api-service, example.com/web → web-service." Requires an Ingress controller (nginx, traefik, etc.) to be installed.

# Check controller
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx
 
# Check Ingress
kubectl get ingress
kubectl describe ingress <n>
 
# Get IP
kubectl get ingress <n> -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
 
# Test routing
curl -H "Host: example.com" http://<ingress-ip>/path

Scenario: Create an Ingress that routes example.com/app to service app-svc port 80.

After creating it, verify the ADDRESS is populated (may take 30 seconds). Then test with curl - if you get 404, the service name is wrong. If you get connection refused, the service isn't running. If you get 502 Bad Gateway, the service endpoints are empty.

Gateway API

The future of Ingress: Gateway API is the successor to Ingress, offering more expressive routing, better multi-tenancy, and role-oriented design. While traditional Ingress is still common, Gateway API is gaining adoption.

Key differences from Ingress:

GatewayClass (like StorageClass) - defines the type of Gateway
Gateway (like LoadBalancer Service) - the infrastructure layer
HTTPRoute (like Ingress) - the routing rules

Check Gateway API installation:

# Verify Gateway API CRDs installed
kubectl get crd | grep gateway
 
# List GatewayClasses
kubectl get gatewayclass
 
# List Gateways
kubectl get gateway -A
 
# List HTTPRoutes
kubectl get httproute -A

Gateway verification:

# Describe Gateway for detailed info
kubectl describe gateway <gateway-name> -n <namespace>
 
# Check Gateway status
kubectl get gateway <gateway-name> -n <namespace> -o yaml | grep -A 5 status
 
# Look for:
# - conditions: status=True, type=Programmed
# - addresses: external IP or hostname assigned

Expected output: Gateway should show Programmed=True condition and have an address assigned. If address is missing, the Gateway controller may not be running or cloud provider integration is broken.

HTTPRoute verification:

# List HTTPRoutes
kubectl get httproute -A
 
# Describe HTTPRoute
kubectl describe httproute <route-name> -n <namespace>
 
# Check if route attached to Gateway

kubectl get httproute <route-name> -n <namespace> -o yaml | grep -A 3 parentRefs

What to verify:

parentRefs: Which Gateway this route attaches to
hostnames: What domain names this route handles
rules: Path matching and backend service references
status: Whether the route was accepted by the Gateway

Test Gateway routing:

# Get Gateway address
GATEWAY_IP=$(kubectl get gateway <gateway-name> -n <namespace> \
  -o jsonpath='{.status.addresses[0].value}')
 
# Test routing with curl
curl -H "Host: example.com" http://$GATEWAY_IP/api
 
# Test different paths
curl -H "Host: example.com" http://$GATEWAY_IP/web
 
# Verbose output for debugging
curl -v -H "Host: example.com" http://$GATEWAY_IP/path

Common Gateway API issues:

# Gateway stuck in Pending
kubectl describe gateway <n>
# Check Events for: "No addresses available", "Controller not found"
 
# HTTPRoute not working
kubectl describe httproute <n>
# Check for: "RouteReasonNoMatchingParent", "RouteReasonBackendNotFound"
 
# Verify Gateway controller is running
kubectl get pods -n gateway-system
# or wherever your Gateway controller is deployed
 
# Check Gateway controller logs
kubectl logs -n gateway-system <gateway-controller-pod>

Gateway API vs Traditional Ingress:

Aspect	Traditional Ingress	Gateway API
Resource	Ingress	HTTPRoute
Infrastructure	IngressClass	GatewayClass + Gateway
Routing	Path/Host rules	More expressive matching
Multi-tenancy	Limited	Better separation
Protocol support	HTTP/HTTPS mainly	HTTP, HTTPS, TCP, gRPC

Scenario: Create an HTTPRoute that routes traffic from Gateway 'main-gateway' to service 'api-svc' on path /api.

After creation:

Verify HTTPRoute exists: kubectl get httproute
Check it's attached to Gateway: kubectl describe httproute <n>
Verify Gateway has address: kubectl get gateway main-gateway
Test routing: curl -H "Host: api.example.com" http://<gateway-ip>/api

Quick debugging workflow:

Check GatewayClass exists and has a controller
Check Gateway references valid GatewayClass
Check Gateway has Programmed=True condition
Check Gateway has address assigned
Check HTTPRoute parentRefs matches Gateway name
Check HTTPRoute backend service exists and has endpoints
Test with curl using correct Host header

DNS

Why DNS matters: DNS is how pods find services by name instead of IP. Without working DNS, you'd need to hardcode ClusterIPs everywhere. CoreDNS provides this critical service - if it's down, nothing can communicate using service names.

# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get svc -n kube-system kube-dns
 
# Test DNS
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -it dnsutils -- nslookup kubernetes.default
 
# Check pod DNS config
kubectl exec -it <pod> -- cat /etc/resolv.conf
 
# Test all forms
kubectl exec -it dnsutils -- nslookup <service>
kubectl exec -it dnsutils -- nslookup <service>.<namespace>
kubectl exec -it dnsutils -- nslookup <service>.<ns>.svc.cluster.local

The DNS hierarchy: Services can be accessed by short name (same namespace only), namespace-qualified name (cross-namespace), or FQDN (fully qualified domain name). If nslookup <service> fails but nslookup <service>.<namespace> works, you're testing from a different namespace. The /etc/resolv.conf file shows the nameserver IP (should be CoreDNS ClusterIP, typically 10.96.0.10) and search domains that make short names work.

Imperative Commands & Time-Savers

Writing YAML from scratch takes 2-4 minutes per resource. Generating it with imperative commands takes less than 30 seconds. Over 15-20 questions, this saves you 20-30 minutes. Mastering these commands is the difference between finishing the exam with time to spare versus running out of time.

Generate YAML (Most Important!)

This is the game-changer: If you're writing YAML from scratch during the exam, you're doing it wrong. It takes 3-5 minutes to write a deployment YAML from memory, and you'll probably make syntax errors. The dry-run technique does it in less than 30 seconds with perfect syntax every time.

Here's how it works: Tell kubectl to show you the YAML it would create, but don't actually create anything. Pipe that to a file, edit the 2-3 lines you need to change, then apply. Boom - you just saved 4-5 minutes.

IMPORTANT: When generating YAML files, always use full kubectl commands, not the k alias:

# Pod - use full kubectl command when saving to file
kubectl run nginx --image=nginx $do > pod.yaml
 
# Deployment
kubectl create deployment nginx --image=nginx --replicas=3 $do > deploy.yaml
 
# Service
kubectl expose deployment nginx --port=80 --target-port=8080 $do > svc.yaml
 
# ConfigMap
kubectl create configmap app-config --from-literal=key=val $do > cm.yaml
 
# Secret
kubectl create secret generic db-secret --from-literal=pass=123 $do > secret.yaml

Why full kubectl in files? The k alias only exists in your shell session. If you save k run nginx... in a script or try to reference it later, it won't work. The $do variable works fine because it's an environment variable that expands before hitting the file. But k is a shell alias - it's like a nickname that only works face-to-face!

Think about it this way: You can type k get pods at the command line all day long (fast and easy!), but when you're generating files, use kubectl (reliable and portable).

Example: Create a pod named web with image nginx:1.19, label tier=frontend, CPU request 100m.

Instead of this nightmare:

# Writing this from scratch = 5 minutes + probable typos
apiVersion: v1
kind: Pod
metadata:
  name: web
  labels:
    tier: frontend
spec:
  containers:
  - name: nginx
    image: nginx:1.19
    resources:
      requests:
        cpu: "100m"

Do this:

kubectl run web --image=nginx:1.19 --labels=tier=frontend $do > pod.yaml

# Edit pod.yaml, add these 3 lines under containers[0]:
#   resources:
#     requests:
#       cpu: "100m"

kubectl apply -f pod.yaml

kubectl explain (2x faster than docs!)

Stop searching documentation: The exam provides access to kubernetes.io docs, but searching for the right page and finding the right YAML syntax requires time. kubectl explain gives you the answer in 30 seconds, right in your terminal.

# Find field syntax
kubectl explain pod.spec.containers.livenessProbe
kubectl explain pod.spec.volumes
kubectl explain deployment.spec.strategy
 
# Recursive view
kubectl explain pod --recursive
 
# Navigate hierarchically
kubectl explain deployment
kubectl explain deployment.spec
kubectl explain deployment.spec.template

Practical example: You need to add a readiness probe with HTTP GET. Instead of searching docs:

kubectl explain pod.spec.containers.readinessProbe.httpGet

Output shows you:
path: <string>
port: <string>
httpHeaders: <[]Object>

Now you know exactly what fields exist and their types. Write your YAML with confidence.

Quick Creation Patterns

No YAML needed: For simple resources, imperative commands create them instantly. Only use YAML when you need complex configurations like multiple volumes, init containers, or advanced scheduling constraints.

# Pod with labels
kubectl run nginx --image=nginx --labels="app=web,env=prod"
 
# Pod with env vars
kubectl run nginx --image=nginx --env="KEY=value"
 
# Deployment with replicas
kubectl create deployment web --image=nginx --replicas=3
 
# Expose deployment
kubectl expose deployment web --port=80 --type=NodePort
 
# Scale
kubectl scale deployment web --replicas=5
 
# Update image
kubectl set image deployment/web nginx=nginx:1.16
 
# ConfigMap/Secret
kubectl create configmap app --from-literal=key=value
kubectl create secret generic db --from-literal=pass=secret

Chaining commands: You can create and expose in one line:

kubectl create deployment web --image=nginx --replicas=3 && \
kubectl expose deployment web --port=80 --type=NodePort && \
kubectl get svc,pods

This creates deployment, exposes it, and shows you the results - 15 seconds total.

Exam Workflow

Strategic approach: The exam is as much about time management as technical skill. You can know everything but fail if you spend 20 minutes on a 5-point question. Follow this workflow religiously.

Step-by-step process:

Switch context

kubectl config use-context <context>
kubectl config current-context  # Verify

Why this matters: The exam uses different contexts (clusters). If you forget to switch, you're working on the wrong cluster and get zero points even if your solution is perfect. Always verify with current-context.

Read question completely - note all requirements

Don't skim! Questions often have multiple requirements: "Create deployment X with 3 replicas, expose on NodePort 30080, ensure pods run on nodes with label disk=ssd." Miss one requirement = lose points.

Choose approach:
- Imperative command? → Use it
- Complex config? → Generate YAML, edit, apply

Simple tasks (create pod, scale deployment, expose service) = imperative. Complex tasks (pod with multiple volumes, init containers, complex scheduling) = YAML.

Execute solution

Work fast but deliberately. Copy-paste resource names and image names from the question to avoid typos.

Verify immediately:

kubectl get <resource>
kubectl describe <resource>
kubectl logs <pod>  # If applicable

Critical habit: Don't move to the next question until you verify this one works. A non-working solution gets zero points. 30 seconds of verification saves you from getting nothing.

Confirm all requirements met

Go back to the question and check each requirement one by one. Label correct? Replicas correct? Port correct? Image correct?

Move to next question

If you're stuck after 7 to 8 minutes, flag the question and move on. Come back to it if you have time. Don't let one hard question prevent you from answering five easy ones.

Time Management

First pass (90 min): Easy & medium questions
- Skip anything that takes > 8 minutes
- Get the "easy points" first
- Most candidates can score 60-70% just from easy/medium questions
Second pass (25 min): Flagged difficult questions
- Now tackle the hard ones
- You have buffer time so you can think deeper
- Partial credit is better than nothing
Final review (5 min): Verify critical tasks
- Check etcd backups were created and verified
- Check cluster upgrades show correct version
- Verify you switched contexts correctly for each question
- Quick spot-check of a few answers
Golden rule: Never spend >8 minutes on one question
- 15 questions in 120 minutes = 8 minutes average
- If you're stuck at 8 minutes, you're losing time on other questions
- Flag it and move on

Real-World Troubleshooting Scenarios

Practice these scenarios to build troubleshooting muscle memory:

Scenario 1: Pod Stuck in Pending

Problem: Pod created but stuck in Pending state for 5+ minutes.

Investigation workflow:

# Step 1: Check pod status and events
kubectl describe pod <pod-name>
# Look in Events section for: FailedScheduling, Insufficient CPU/memory

# Step 2: Check node resources
kubectl top nodes
kubectl describe nodes | grep -A 10 "Allocated resources"

# Step 3: Check if PVC is bound (if pod uses volumes)
kubectl get pvc 

# Step 4: Check node taints and pod tolerations
kubectl describe nodes | grep Taints
kubectl get pod <pod-name> -o yaml | grep -A 5 tolerations

Common causes & fixes:

Insufficient resources: Nodes don't have enough CPU/memory
- Fix: Scale down other pods or add more nodes
- Verify: kubectl top nodes shows available resources
PVC not bound: Pod waiting for storage
- Fix: Check PVC status, verify StorageClass exists
- Verify: kubectl get pvc shows Bound status
Node affinity mismatch: No nodes match affinity rules
- Fix: Check node labels match pod's nodeSelector/affinity
- Verify: kubectl get nodes --show-labels
Taints without tolerations: Nodes are tainted, pod doesn't tolerate
- Fix: Add toleration to pod or remove taint from node
- Verify: Pod schedules successfully

Scenario 2: CrashLoopBackOff

Problem: Pod status shows CrashLoopBackOff, container keeps restarting.

Let me tell you a story: This is probably the most frustrating status to see. Your pod is like "I'll try to start... nope, crashed. Let me try again... nope, crashed again. One more time... still crashing." And it keeps doing this forever, waiting longer between each attempt (that's the "backoff" part).

The container is running, something crashes it, Kubernetes restarts it, it crashes again. Rinse and repeat. And here's the kicker - if you just run kubectl logs <pod>, you might see nothing! Why? Because the container that crashed isn't running anymore. You need the --previous flag to see the logs from the crashed container. This one flag has saved me countless times.

Investigation workflow:

# Step 1: Check restart count
kubectl get pod <pod-name>
# RESTARTS column shows how many times - if it's 5+, you've got a real problem
 
# Step 2: Check previous container logs (THIS IS THE GOLDEN TICKET!)
kubectl logs <pod-name> --previous
 
# Step 3: Check current container logs (might be empty)
kubectl logs <pod-name>
 
# Step 4: Describe pod for detailed events
kubectl describe pod <pod-name>
 
# Step 5: Check liveness/readiness probes
kubectl describe pod <pod-name> | grep -A 10 "Liveness\|Readiness"

Common causes & fixes:

1. Application crash on startup - Missing environment variables, can't connect to database

This is the #1 cause. Your app needs DATABASE_URL but it's not set. Or the database service doesn't exist yet. The logs tell you everything:

kubectl logs web-pod --previous
# Error: DATABASE_URL environment variable not set
# Error: Can't connect to mysql:3306
 
# Check if database service exists
kubectl get svc mysql
# Error from server (NotFound): services "mysql" not found
 
# Aha! Create the missing service
kubectl expose deployment mysql --port=3306
 
# Delete pod to restart with valid config
kubectl delete pod web-pod
kubectl get pods --watch
# Now it should start successfully

2. Liveness probe too aggressive - Probe kills container before app finishes starting

Your app takes 30 seconds to fully start, but the liveness probe checks every 5 seconds and kills it if it doesn't respond in 1 second. Boom - CrashLoopBackOff city!

# Check the probe settings
kubectl describe pod <pod> | grep -A 10 Liveness
# Liveness: http-get http://:8080/ delay=0s timeout=1s period=5s
 
# See the problem? delay=0s means it starts checking IMMEDIATELY
# Your app hasn't even started yet!
 
# Fix: Edit deployment to increase initialDelaySeconds
kubectl edit deployment <deployment-name>
# Change: initialDelaySeconds: 30
# This gives your app time to start before probes begin

3. Out of memory - Container killed by OOM (Out Of Memory)

Your container is using more memory than its limit, so Kubernetes kills it. Then it restarts, uses too much memory again, gets killed again. Repeat forever.

kubectl describe pod <pod-name>
# Look for: Last State: Terminated, Reason: OOMKilled
 
# Fix: Increase memory limits or optimize your application
kubectl edit deployment <deployment-name>

# resources:
#   limits:
#     memory: "512Mi"  # Increase this

The pattern you should see: Every CrashLoopBackOff has a reason in the logs or events. You just need to look in the right place. Logs from the previous container (--previous) are your best friend here. Don't skip this step!

Scenario 3: Service Not Accessible

Problem: Service created but can't access it (curl fails, connection timeout).

Investigation workflow:

# Step 1: Verify service exists
kubectl get svc <service-name>
 
# Step 2: Check endpoints (MOST IMPORTANT!)
kubectl get endpoints <service-name>
 
# Step 3: If endpoints are empty, check selector
kubectl describe svc <service-name> | grep Selector
kubectl get pods --selector=<label-key>=<label-value>
 
# Step 4: If pods exist but not in endpoints, check readiness
kubectl describe pod <pod-name> | grep -A 5 "Ready:"
 
# Step 5: Test connectivity from debug pod
kubectl run test --rm -it --image=nicolaka/netshoot -- bash
# Inside pod: curl <service-name>:<port>
 
# Step 6: Check network policies
kubectl get networkpolicy -A
kubectl describe networkpolicy <policy-name>

Common causes & fixes:

Empty endpoints - Selector mismatch: Service selector doesn't match pod labels
- Fix: Update service selector or pod labels to match
- Example:

# Service has selector: app=web
kubectl describe svc web-svc | grep Selector
# Selector: app=web
  
# But pods have label: app=webapp
kubectl get pods --show-labels
  
# Fix: Update service selector
kubectl patch svc web-svc -p '{"spec":{"selector":{"app":"webapp"}}}'
  
# Verify endpoints populated
kubectl get endpoints web-svc

Pods not ready: Readiness probe failing
- Fix: Check pod logs, fix readiness probe or application
- Verify: kubectl get pods shows READY 1/1
NetworkPolicy blocking: Policy prevents traffic
- Fix: Update NetworkPolicy to allow traffic or remove it
- Verify: Test connectivity succeeds
Wrong port: Service port doesn't match pod port
- Fix: Update service targetPort to match container port
- Verify: kubectl describe svc shows correct port mapping

Scenario 4: Node NotReady

Problem: One or more nodes showing NotReady status.

Investigation workflow:

# Step 1: Identify NotReady node
kubectl get nodes
 
# Step 2: Describe node for conditions
kubectl describe node <node-name>
# Check Conditions: MemoryPressure, DiskPressure, PIDPressure, NetworkUnavailable
 
# Step 3: Check kubelet status (SSH to node)
systemctl status kubelet
journalctl -u kubelet -n 50
 
# Step 4: Check node resources
kubectl top node <node-name>
 
# Step 5: Debug node with shell access
kubectl debug node/<node-name> -it --image=ubuntu
# Inside debug pod, check:
df -h  # Disk usage at /host
free -h  # Memory

Common causes & fixes:

Kubelet stopped: Service crashed or not running
- Fix: Restart kubelet

systemctl restart kubelet
systemctl status kubelet
# Verify: Node returns to Ready
kubectl get nodes

Disk full: Node out of disk space (DiskPressure=True)

Fix: Clean up logs, old images, evicted pods

# SSH to node
docker system prune -a

# or for containerd
crictl rmi --prune

# Remove old logs
journalctl --vacuum-time=3d

Network plugin issue: CNI pod crashed
- Fix: Check CNI pod status, restart if needed

kubectl get pods -n kube-system | grep calico
kubectl delete pod <cni-pod> -n kube-system

Certificate expired: Kubelet can't authenticate to API server
- Fix: Renew certificates

kubeadm certs renew all
systemctl restart kubelet

Scenario 5: Deployment Rollout Stuck

Problem: Deployment update started but stuck, some pods old version, some new.

Investigation workflow:

# Step 1: Check rollout status
kubectl rollout status deployment/<deployment-name>
 
# Step 2: Check ReplicaSets
kubectl get rs
# Should see old RS scaling down, new RS scaling up
 
# Step 3: Check new pods
kubectl get pods -l app=<app-label>
kubectl describe pod <new-pod-name>
 
# Step 4: Check deployment events
kubectl describe deployment <deployment-name>
 
# Step 5: Check rollout history
kubectl rollout history deployment/<deployment-name>

Common causes & fixes:

New pods won't start: Image pull error, config error
- Fix: Check pod logs and events, fix the issue

kubectl describe pod <new-pod>
# Events show: ImagePullBackOff or CrashLoopBackOff
  
# Fix image name if wrong
kubectl set image deployment/<n> <container>=<correct-image>

Insufficient resources: Can't schedule new pods
- Fix: Scale down or add resources

# Check available resources
kubectl top nodes
# Either scale down old pods or add nodes

Readiness probe failing: New pods never become ready
- Fix: Check readiness probe configuration

kubectl describe pod <new-pod> | grep -A 10 Readiness
kubectl logs <new-pod>
# Fix application or probe configuration

Need to rollback: New version has bugs
- Fix: Rollback to previous version

kubectl rollout undo deployment/<deployment-name>
kubectl rollout status deployment/<deployment-name>

Scenario 6: PVC Stuck in Pending

Problem: PersistentVolumeClaim created but stuck in Pending, won't bind to PV.

Investigation workflow:

# Step 1: Check PVC status
kubectl get pvc
kubectl describe pvc <pvc-name>
 
# Step 2: Check available PVs
kubectl get pv
 
# Step 3: Check StorageClass
kubectl get sc
kubectl describe sc <sc-name>
 
# Step 4: Check provisioner pods
kubectl get pods -n kube-system | grep provisioner
kubectl logs -n kube-system <provisioner-pod>
 
# Step 5: Check events
kubectl describe pvc <pvc-name> | grep -A 10 Events

Common causes & fixes:

No StorageClass: PVC doesn't specify SC and no default exists

Fix: Set default StorageClass or specify in PVC

# Set default SC
kubectl patch sc <sc-name> -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Or specify in PVC
kubectl edit pvc <pvc-name>
# Add: storageClassName: <sc-name>

Dynamic provisioner not running: No provisioner to create PV

Fix: Deploy storage provisioner

kubectl get pods -n kube-system | grep provisioner
# If missing, install provisioner for your storage backend

Access mode mismatch: PVC wants RWX but PV only has RWO

Fix: Change PVC access mode or provide compatible PV

kubectl get pv -o custom-columns=NAME:.metadata.name,ACCESS:.spec.accessModes
# Ensure PV has compatible access mode

Insufficient storage: PVC requests 100Gi but largest PV is 50Gi

Fix: Create larger PV or reduce PVC request

kubectl get pv -o custom-columns=NAME:.metadata.name,SIZE:.spec.capacity.storage

Scenario 7: DNS Not Working

Problem: Pods can't resolve service names (nslookup fails, curl by name fails but by IP works).

Investigation workflow:

# Step 1: Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Step 2: Check CoreDNS service
kubectl get svc -n kube-system kube-dns

# Step 3: Test DNS from a pod
kubectl run test --rm -it --image=busybox:1.28 -- nslookup kubernetes.default

# Step 4: Check pod's resolv.conf
kubectl exec -it <pod> -- cat /etc/resolv.conf

# Step 5: Check CoreDNS logs
kubectl logs -n kube-system <coredns-pod>

Common causes & fixes:

CoreDNS pods not running: DNS service down

Fix: Check why CoreDNS crashed, fix and restart

 kubectl describe pod -n kube-system <coredns-pod>
 kubectl delete pod -n kube-system <coredns-pod>
 # Wait for new pod to start

Wrong nameserver in resolv.conf: Pod not pointing to CoreDNS

Fix: Usually indicates kubelet issue, check kubelet config

# On node, check kubelet configuration
cat /var/lib/kubelet/config.yaml | grep clusterDNS
# Should match CoreDNS service ClusterIP

Network policy blocking DNS: Policy prevents DNS queries

Fix: Allow DNS (port 53) in NetworkPolicy

kubectl get networkpolicy -A
# Ensure policies allow DNS to kube-dns service

CoreDNS ConfigMap misconfigured: Wrong upstream DNS servers

Fix: Check and fix CoreDNS ConfigMap

# Verify upstream DNS servers are correct
kubectl get cm coredns -n kube-system -o yaml

Common Verification Patterns

After creating resources, always verify systematically:

These are your "muscle memory" verification commands. After creating any resource, run the appropriate verification automatically.

# Pods
kubectl get pods  # STATUS=Running

# Deployments
kubectl get deployment  # READY=X/X

# Services
kubectl get svc,endpoints  # Endpoints exist

# PVC
kubectl get pvc  # STATUS=Bound

# ConfigMap/Secret
kubectl get cm,secret
kubectl describe cm <n>

What you're checking:

Pods: STATUS should be Running, not Pending/CrashLoopBackOff/Error
Deployments: READY should match desired count (3/3), not less (2/3)
Services: Endpoints should list pod IPs, not be empty
PVCs: STATUS should be Bound within 10 seconds, not stuck Pending
ConfigMaps/Secrets: They exist and have the expected keys

Troubleshooting workflow:

When something doesn't work, follow this sequence every time:

kubectl get nodes
kubectl get pods -n kube-system
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl logs <pod>
kubectl describe <resource>

Why this order:

Nodes not Ready = cluster-level problem, fix this first
System pods not running = control plane broken, everything will fail
Events show recent errors = tells you what just broke
Logs show application errors = most specific diagnostic info
Describe shows configuration and recent events = comprehensive view

This workflow diagnoses 90% of issues in under 2 minutes.

Quick Reference Card

# Context (Always First!)
kubectl config use-context <context>
kubectl config current-context  # Verify!

# Quick checks
k get nodes
k get po -A
k get svc,ep
k get pv,pvc
k get events --sort-by=.metadata.creationTimestamp

# Describe & logs
k describe <resource> <name>
k logs <pod>
k logs <pod> --previous  # For crashes

# Exec into pod
k exec -it <pod> -- <command>
k exec -it <pod> -- /bin/sh

# Troubleshooting
k debug node/<node> -it --image=ubuntu
k top nodes
k top pods

# RBAC
k auth can-i <verb> <resource>
k auth can-i <verb> <resource> --as <user>

# Deployment operations
k rollout status deployment/<n>
k rollout history deployment/<n>
k rollout undo deployment/<n>
k scale deployment/<n> --replicas=<n>
k set image deployment/<n> <container>=<image>

# Storage
k get sc
k get pvc --watch
k describe pvc <n>

# Network debugging
k run test --rm -it --image=nicolaka/netshoot -- bash
k exec -it <pod> -- nslookup <service>

# etcd backup (MEMORIZE certificate paths!)
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd.db \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt

# Generate YAML templates
k run <pod> --image=<image> $do > pod.yaml
k create deploy <n> --image=<image> --replicas=<n> $do > deploy.yaml
k expose deploy <n> --port=<port> $do > svc.yaml
k create cm <n> --from-literal=k=v $do > cm.yaml
k create secret generic <n> --from-literal=k=v $do > secret.yaml

Time-saving aliases recap:

alias k='kubectl'
alias kgp='kubectl get pods'
alias kgs='kubectl get svc'
alias kd='kubectl describe'
export do='--dry-run=client -o yaml'
export now='--force --grace-period=0'

Success Checklist

You're ready for the exam when you can consistently:

✓ Complete 15 questions in under 90 - 100 minutes
Practice with timer. If you can't finish in under 90 to 100 minutes, you're not fast enough yet.

✓ Score 90%+ on practice exams
The real exam is harder and more stressful. If you can't score 95% in practice, you might not pass the real thing.

✓ Generate YAML templates instantly
No hesitation, no looking up commands. kubectl run nginx --image=nginx $do > pod.yaml should be automatic.

✓ Use kubectl explain without hesitation
When you need field syntax, your first thought should be kubectl explain, not "let me search docs."

✓ Troubleshoot pods in < 2 minutes
See a failing pod, diagnose the root cause in under 2 minutes using describe, logs, and events.

✓ Switch contexts without errors
This should be second nature - copy command, run it, verify with current-context.

✓ Verify solutions systematically
After creating any resource, automatically run the verification commands without thinking.

Practice indicators you're ready:

You don't need to look up basic kubectl commands
You can troubleshoot from memory, not notes
You complete mock exams with 20-30 minutes to spare
You catch your own mistakes during verification
You feel confident, not stressed

You're NOT ready if:

You need notes for basic commands
You forget to switch contexts frequently
You spend > 10 minutes on single questions
You don't finish practice exams on time
You score <80% on practice exams

Final Preparation Tips

One week before exam:

Do full mock exams daily - Killer.sh, KodeKloud
Review this guide completely
Focus on weak areas
Memorize verification commands
Practice aliases until automatic
Review common failure scenarios - Pod CrashLoopBackOff, PVC Pending, Service no endpoints
Time yourself strictly - Use actual exam timing

Days before exam:

Light review only - Don't cram, you'll just stress yourself
Practice environment setup - Aliases, vim config, completion (do this 5 times)
Review troubleshooting workflows - The systematic node → component → pod approach
Get good sleep - 7-8 hours minimum. Tired = mistakes = failure
Prepare your workspace - Quiet room, good internet, backup internet
Test your setup - Webcam, microphone, screen sharing

Exam day:

Arrive 15 minutes early - Don't rush, be calm
Do the environment check calmly - Don't panic during setup
Set up environment first 5 minutes - Aliases, vim, completion
Follow your workflow - Don't improvise under pressure
Trust your preparation - You've done this 100 times in practice
Keep moving forward - Don't get stuck on hard questions
Verify every solution - 30 seconds of checking = points
Use all available time - If you finish early, review flagged questions

During the exam:

Take a deep breath every 5 questions
If you're stuck, move on immediately
Remember: 66% to pass, you don't need perfection
Every question you verify correctly is points in the bank
The exam environment is stressful - this is normal

Common Mistakes to Avoid

These mistakes cost candidates the most points:

❌ Forgetting to switch context

Lost points even with correct solution
Always copy-paste context command from question
Always verify with kubectl config current-context

❌ Not verifying solutions

Created deployment but it's not running
Exposed service but no endpoints
30 seconds of verification = difference between 0 and full points

❌ Writing YAML from scratch

Takes 5-8 minutes per resource
High chance of syntax errors
Use --dry-run=client -o yaml instead

❌ Spending > 7 to 8 minutes on one question

Time is your enemy
Flag and move on
Come back if time permits

❌ Not using kubectl explain

Wastes time searching documentation
kubectl explain gives answer in 30 seconds
Practice using it until it's your first instinct

❌ Typing resource names manually

Typos cost time and points
Copy-paste from question
Use tab completion

❌ Not checking logs for crashes

Pod in CrashLoopBackOff but didn't check --previous logs
Logs tell you exactly why it crashed
Always check logs when troubleshooting

❌ Ignoring the Events section

Events show what just failed
kubectl describe Events section is gold
Shows "FailedScheduling", "FailedMount", "ImagePullBackOff" reasons

You've Got This!

Remember:

The CKA exam is challenging but passable with proper preparation
Speed comes from practice - Complete as many tasks as possible before appearing for the exam
Verification is not optional - it's how you guarantee points
Imperative commands save you 20-30 minutes
Systematic troubleshooting beats random guessing every time

The exam tests two things: