KillerCoda NodePort Issue

Hi, good evening.

I’m trying to find the answer of this question on KillerCoda, but I can’t figure out what’s happening… Someone can help me to find out about it?

when you run kubectl get nodes OR kubectl get pod

- The connection to the server 172.30.1.2:6443 was refused 
- did you specify the right host or port?

need to wait for few seconds to make above command work again but above error will come again after few second
and also kube-controller-manager-controlplane pod continuously restarting Fix above issue
Expectation: kube-apiserver-controlplane pods running in kube-system namespace

The best place to ask about KillerCoda is on their slack channel. They’re pretty good about answering questions there.

@Philippe-Geraldeli

It doesn’t help that node-port-issue is the name of this test, because as we will see, it is nothing to do with service node ports. Also this question can be considered in the “brutally hard” category and it would be extremely unlikely to get one like this in real exam due to the time needed to solve! The average time per question is 6 minutes.

  1. ssh to control plane, then observe what is happening with controlplane pods

    watch crictl ps
    

    Note that several pods including API server are restarting periodically.

  2. Next, let’s look at logs of API server to see if that gives a clue to why it is restarting. Open another terminal tab and ssh to control plane, leaving the watch running in the first tab so you can go back to it. To find API server logs, see this write up. There should be more than one log file in the log folder if you have left it for long enough, so examine the last-but-one which will be a complete log for an instance that has terminated.

  3. From this log, it appears that the API server has done a normal shutdown as the last 30 or so lines of the log say “shutting down this, shutting down that, etc.”. The only way it will have done a normal shutdown as opposed to outright crashing is if it has been told to shut down by kubelet sending it a SIGTERM. This indicates a probe failure. Note also that other control plane pods like controller-manager will restart along with API server as they crash when they cannot contact API server. Only etcd does not recycle because API server talks to etcd, not the other way round.

  4. Examine the probe settings in the API server manifest…

    vi /etc/kubernetes/manifests/kube-apiserver.yaml
    

    Note the port setting for the probes. Does it look right?

  5. Fix this and wait for everything to restart (return to first tab that is running the watch). When everything has restarted, test again kubectl get nodes etc.

The question will then validate.