I’m going through the Installation section of the CKA course. I’ve created all the certificates, kubeconfigs, and service definition files and I’ve already successfully created my etcd cluster.
I’m able to successfully start the kube-apiserver service, but both kube-controller-manager and kube-scheduler are timing out when starting.
Running kubectl get componentstatuses --kubeconfig admin.kubeconfig shows everything healthy for a while but every so often it looks like the controller-manager and scheduler restart.
Note that I’m following the tutorial on my own servers, not using the Vagrant setup, but the only thing different should be the IP range I’m using.
cloud_user@master-2:~$ sudo systemctl status kube-controller-manager.service
● kube-controller-manager.service - Kubernetes Controller Manager
Loaded: loaded (/etc/systemd/system/kube-controller-manager.service; enabled; vendor preset: enabled)
Active: activating (start) since Fri 2020-08-14 13:20:29 UTC; 1min 11s ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 5495 (kube-controller)
Tasks: 9 (limit: 2313)
CGroup: /system.slice/kube-controller-manager.service
└─5495 /usr/local/bin/kube-controller-manager --address=0.0.0.0 --cluster-cidr=192.168.5.0/24 --cluster-name=kubernetes --cluster-signing-cert-file=/var/lib/kubernetes/ca.crt --cluster-signing-key-file=/var/lib/kubernetes/ca.ke
Aug 14 13:20:51 master-2 kube-controller-manager[5495]: I0814 13:20:51.713828 5495 controller_utils.go:1034] Caches are synced for HPA controller
Aug 14 13:20:51 master-2 kube-controller-manager[5495]: I0814 13:20:51.720908 5495 controller_utils.go:1034] Caches are synced for PVC protection controller
Aug 14 13:20:51 master-2 kube-controller-manager[5495]: I0814 13:20:51.750398 5495 controller_utils.go:1034] Caches are synced for job controller
Aug 14 13:20:51 master-2 kube-controller-manager[5495]: I0814 13:20:51.750820 5495 controller_utils.go:1034] Caches are synced for endpoint controller
Aug 14 13:20:51 master-2 kube-controller-manager[5495]: I0814 13:20:51.801169 5495 controller_utils.go:1034] Caches are synced for persistent volume controller
Aug 14 13:20:51 master-2 kube-controller-manager[5495]: I0814 13:20:51.842475 5495 controller_utils.go:1034] Caches are synced for resource quota controller
Aug 14 13:20:52 master-2 kube-controller-manager[5495]: I0814 13:20:52.144973 5495 controller_utils.go:1034] Caches are synced for garbage collector controller
Aug 14 13:20:52 master-2 kube-controller-manager[5495]: I0814 13:20:52.145016 5495 garbagecollector.go:142] Garbage collector: all resource monitors have synced. Proceeding to collect garbage
Aug 14 13:20:52 master-2 kube-controller-manager[5495]: I0814 13:20:52.193013 5495 controller_utils.go:1034] Caches are synced for garbage collector controller
Aug 14 13:20:52 master-2 kube-controller-manager[5495]: I0814 13:20:52.193061 5495 garbagecollector.go:245] synced garbage collector
I’ve tried it with both the cluster-cidr in the instructions, as well as the CIDR range of my actual VMs and neither is working. Any suggestions for what to troubleshoot next?
There are no logs for any kubernetes component under /var/log. I tried adding --log-dir=/var/log and --logtostderr=false to the kube-controller-manager service file but it’s still not creating the logs.
journalctl -u kube-controller-manager.service -xe shows me
-- Unit kube-controller-manager.service has begun starting up.
Aug 26 19:05:23 master-1 kube-controller-manager[5630]: Flag --address has been deprecated, see --bind-address instead.
Aug 26 19:05:41 master-1 kube-controller-manager[5630]: E0826 19:05:41.618770 5630 core.go:76] Failed to start service controller: WARNING: no cloud provider provided, services of type LoadBalancer will fail
Aug 26 19:05:44 master-1 kube-controller-manager[5630]: E0826 19:05:44.797889 5630 resource_quota_controller.go:171] initial monitor sync has error: couldn't start monitor for resource "extensions/v1beta1, Resource=networkpolicies": un
Aug 26 19:05:46 master-1 kube-controller-manager[5630]: E0826 19:05:46.287091 5630 resource_quota_controller.go:437] failed to sync resource monitors: couldn't start monitor for resource "extensions/v1beta1, Resource=networkpolicies":
Aug 26 19:06:53 master-1 systemd[1]: kube-controller-manager.service: Start operation timed out. Terminating.
Aug 26 19:06:53 master-1 systemd[1]: kube-controller-manager.service: Failed with result 'timeout'.
Aug 26 19:06:53 master-1 systemd[1]: Failed to start Kubernetes Controller Manager.
-- Subject: Unit kube-controller-manager.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit kube-controller-manager.service has failed.
--
-- The result is RESULT.
Aug 26 19:06:58 master-1 systemd[1]: kube-controller-manager.service: Service hold-off time over, scheduling restart.
Aug 26 19:06:58 master-1 systemd[1]: kube-controller-manager.service: Scheduled restart job, restart counter is at 1.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Automatic restarting of the unit kube-controller-manager.service has been scheduled, as the result for
-- the configured Restart= setting for the unit.
Aug 26 19:06:58 master-1 systemd[1]: Stopped Kubernetes Controller Manager.
-- Subject: Unit kube-controller-manager.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit kube-controller-manager.service has finished shutting down.
Aug 26 19:06:58 master-1 systemd[1]: Starting Kubernetes Controller Manager...
-- Subject: Unit kube-controller-manager.service has begun start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit kube-controller-manager.service has begun starting up.
Aug 26 19:06:58 master-1 kube-controller-manager[5741]: Flag --address has been deprecated, see --bind-address instead.
I figured it out. I had type=Notify in my unit files, and it seems like the service daemons aren’t sending signals to systemd that the service is ready, so it was causing it to constantly restart. After removing that line from the unit files it seems to be consistently healthy.