Hi Team, I am trying an interesting excercise on killercoda whereby we introduce . . .

Karim:
Hi Team, I am trying an interesting excercise on killercoda whereby we introduce a problem in the manifest file of the API Server. Interesting enough after I fix it it no longer comes back and I keep seeing the error failed deleting a mirror pod. I tried restarting the kubelet but that doesn’t seem to help. Any ideas on how to recover? The solution doesn’t seem to have any indication that this would happen.

apiVersionTHIS IS VERY ::::: WRONG v1

Nov 08 15:21:02 controlplane kubelet[25009]: E1108 15:21:02.796530   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane"
Nov 08 15:21:02 controlplane kubelet[25009]: E1108 15:21:02.796713   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane-controlplane"
Nov 08 15:21:04 controlplane kubelet[25009]: E1108 15:21:04.795964   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane"
Nov 08 15:21:04 controlplane kubelet[25009]: E1108 15:21:04.796506   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane-controlplane"
Nov 08 15:21:06 controlplane kubelet[25009]: E1108 15:21:06.796959   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane-controlplane"
Nov 08 15:21:06 controlplane kubelet[25009]: E1108 15:21:06.797704   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane"
Nov 08 15:21:08 controlplane kubelet[25009]: E1108 15:21:08.796744   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane"
Nov 08 15:21:08 controlplane kubelet[25009]: E1108 15:21:08.796940   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane-controlplane"
Nov 08 15:21:10 controlplane kubelet[25009]: E1108 15:21:10.796359   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane-controlplane"
Nov 08 15:21:10 controlplane kubelet[25009]: E1108 15:21:10.796703   25009 mirror_client.go:138] "Failed deleting a mirror pod" err="Delete \"<https://172.30.1.2:6443/api/v1/namespaces/kube-system/pods/kube-apiserver-controlplane>\": dial tcp 172.30.1.2:6443: connect: connection refused" pod="kube-system/kube-apiserver-controlplane"
Nov 08 15:21:11 controlplane kubelet[25009]: E1108 15:21:11.240080   25009 event.go:276] Unable to write event: '&amp;v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"calico-kube-controllers-58dbc876ff-hlwfp.1725a41dbfb15e46", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:&lt;nil&gt;, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"calico-kube-controllers-58dbc876ff-hlwfp", UID:"0d38c4de-69f1-4ed3-a5f9-bf7d139f020e", APIVersion:"v1", ResourceVersion:"497", FieldPath:"spec.containers{calico-kube-controllers}"}, Reason:"Unhealthy", Message:"(combined from similar events): Readiness probe failed: Error verifying datastore: Get \"<https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default>\": dial tcp 10.96.0.1:443: connect: connection refused; Error reaching apiserver: Get \"<https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default>\": dial tcp 10.96.0.1:443: connect: connection refused with http status code: 0\n", Source:v1.EventSource{Component:"kubelet", Host:"controlplane"}, FirstTimestamp:time.Date(2022, time.November, 8, 15, 1, 24, 687715910, time.Local), LastTimestamp:time.Date(2022, time.November, 8, 15, 20, 38, 255314214, time.Local), Count:17, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "<https://172.30.1.2:6443/api/v1/namespaces/kube-system/events/calico-kube-controllers-58dbc876ff-hlwfp.1725a41dbfb15e46>": dial tcp 172.30.1.2:6443: connect: connection refused'(may retry after sleeping)
Nov 08 15:21:12 controlplane kubelet[55281]: I1108 15:21:12.589851   55281 kubelet.go:281] "Adding apiserver pod source"
Nov 08 15:21:12 controlplane kubelet[55281]: I1108 15:21:12.589948   55281 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Nov 08 15:21:12 controlplane kubelet[55281]: I1108 15:21:12.686160   55281 status_manager.go:161] "Starting to sync pod status with apiserver"

Alistair Mackay:
Hi @Karim

the idea of that exercise is to put a deliberate error in the api server manifest and observe what happens, so that if you break the api server yourself doing an exam question or the exam gives you a broken api server, then you know how to troubleshoot this.

That killercoda exercise is the inspiration for this document https://github.com/kodekloudhub/community-faq/blob/main/docs/diagnose-crashed-apiserver.md

Karim:
thanks a lot @Alistair Mackay will try it again with the new info :slightly_smiling_face:

Karim:
@Alistair Mackay I am trying the lab again. I solved the 3 issues ( : after metadata, port of the ETCD, authorization-module) within the manifest file but I can’t see the apiserver up. I cannot see any clear error on the kubelet and no logs under the pods.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    <http://kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint|kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint>: 172.30.1.2:6443
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=172.30.1.2
    - --allow-privileged=true
    - --authorization-module=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=<https://127.0.0.1:2379>

Karim:

Nov 09 03:13:32 controlplane kubelet[45046]: E1109 03:13:32.730851   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:13:46 controlplane kubelet[45046]: E1109 03:13:46.730862   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:14:00 controlplane kubelet[45046]: E1109 03:14:00.730921   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:14:15 controlplane kubelet[45046]: E1109 03:14:15.731033   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:14:29 controlplane kubelet[45046]: E1109 03:14:29.733384   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:14:41 controlplane kubelet[45046]: E1109 03:14:41.731261   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:14:54 controlplane kubelet[45046]: E1109 03:14:54.730682   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:15:05 controlplane kubelet[45046]: E1109 03:15:05.731053   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578
Nov 09 03:15:18 controlplane kubelet[45046]: E1109 03:15:18.731534   45046 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-apiserver\" with CrashLoopBackOff: \"back-off 5m0s restarting failed container=kube-apiserver pod=kube-apiserver-controlplane_kube-system(81a3932f8f12e9cd6dd5ba085d9a5578)\"" pod="kube-system/kube-apiserver-controlplane" podUID=81a3932f8f12e9cd6dd5ba085d9a5578

Alistair Mackay:
Kubelet is reporting CrashLoopBackOff which means the manifest is OK, but the process is failing to start. There should be something in /var/log/pods