Cluster failure after rebooting

Hey everyone, I’ve hit a roadblock and could really use your help. Our cluster was running smoothly until the machine rebooted. Now, we’re encountering the following error:

E0707 13:24:15.546885 42410 memcache.go:265] couldn’t get current server API group list: Get “https://localhost:6444/api?timeo ut=32s”: dial tcp [::1]:6444: connect: connection refused
E0707 13:24:15.548249 42410 memcache.go:265] couldn’t get current server API group list: Get “https://localhost:6444/api?timeo ut=32s”: dial tcp [::1]:6444: connect: connection refused
The connection to the server localhost:6444 was refused - did you specify the right host or port?

Does anyone know how to resolve this issue?

Where did this log output come from?

Thank you! I found out the problem was the garbage collector deleting many Kubernetes images after a reboot. Since I am working offline, they didn’t pull again. I increased the time of the garbage collector. Thanks a lot for your help!

If you’re using an airgapped cluster, you should consider hosting an image registry within the private network for the cluster to use. If you’re on a cloud like AWS then it can be the cloud registry, or if in a datcenter then run something like Artifactory or Proget.

1 Like

Thanks,AliStair for your advice. I have a few more questions related to this topic.

What do you think about automation tools for installing Kubernetes, like “KURL”? Do you think they are reliable for use in a production environment?

Currently, I am using a private registry called Harbor, but I haven’t yet configured Kubernetes to pull images from it. Instead, I’ve increased the garbage collector time. Do you think this is a temporary solution, or can it be enough for the long term?

Long term you cluster should be using the harbor repo for its images, both for the cluster itself and for the applications it is running. You should have automation in a CI/CD process for pushing app images there. You should have a process for pushing any cluster images you need too.

Increasing garbage collection time risks filling your nodes up with images and running them out of disk space.

You should definitely automate the provisioning of the cluster itself - terraform to build the nodes it runs on and how you install kube itself largely depends on the type of cluster you run (kubeadm, hard way type with OS services, etc)

1 Like

Thank you so much for your support