CKA lab error - upgrade version

mainframe2cloud · May 18, 2023, 9:15am

hi,

Cannot upgrade controlplane with kubeadm in the lab env.

controlplane ~ ➜  kubeadm upgrade apply v1.26.0
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0518 04:34:55.008536   28391 configset.go:177] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: strict decoding error: unknown field "udpIdleTimeout"
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
        [ERROR ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [controlplane]
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

controlplane ~ ✖ kubeadm upgrade apply v1.26.0 --ignore-preflight-errors=ControlPlaneNodesReady
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0518 04:35:44.711397   28607 configset.go:177] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeproxy.config.k8s.io", Version:"v1alpha1", Kind:"KubeProxyConfiguration"}: strict decoding error: unknown field "udpIdleTimeout"
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
        [WARNING ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [controlplane]
[upgrade/version] You have chosen to change the cluster version to "v1.26.0"
[upgrade/versions] Cluster version: v1.25.0
[upgrade/versions] kubeadm version: v1.26.0
[upgrade] Are you sure you want to proceed? [y/N]: y
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.26.0" (timeout: 5m0s)...
[upgrade/etcd] Upgrading to TLS for etcd
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Renewing etcd-server certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2023-05-18-04-36-12/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node controlplane did not change after 5m0s: timed out waiting for the condition
[upgrade/etcd] Waiting for previous etcd to become available
[upgrade/etcd] Etcd was rolled back and is now available
[upgrade/apply] FATAL: fatal error when trying to upgrade the etcd cluster, rolled the state back to pre-upgrade state: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: static Pod hash for component etcd on Node controlplane did not change after 5m0s: timed out waiting for the condition

mmkmou · May 18, 2023, 1:48pm

Hi @mainframe2cloud

Just tested and it’s work fine

I see that you’ve the following error :

How do you drain your controlplane ?? Because it’s not apparently not Ready

Regard

mainframe2cloud · May 20, 2023, 1:37am

k drain controplane --ignore-daemonset

then I found kubelet on it cannot start:

May 19 21:35:42 controlplane systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
May 19 21:35:42 controlplane systemd[1]: kubelet.service: Failed with result 'exit-code'.
May 19 21:35:52 controlplane systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 102.
May 19 21:35:52 controlplane systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
May 19 21:35:52 controlplane systemd[1]: Started kubelet: The Kubernetes Node Agent.
May 19 21:35:52 controlplane kubelet[46779]: E0519 21:35:52.993652   46779 run.go:74] "command failed" err="failed to parse kubelet flag: unknown flag: --container-runtime"
May 19 21:35:53 controlplane systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
May 19 21:35:53 controlplane systemd[1]: kubelet.service: Failed with result 'exit-code'.

mmkmou · May 20, 2023, 1:42am

Hun
Weird

Try to follow the instructions on this link

mainframe2cloud · May 20, 2023, 2:10am

following the official commands worked, the diff I noticed is during the installation of kubeadm with apt upgrade, it touched sshconfig which may cause the error.