I am going through the basic setup of kubernetes 1.3.0 in a 3 node (Master, 2 workers).
I have installed k8s before but I am stumped on this, Google not helping.
I have three Oracle Linux 9 VMs running in VMware work station Pro 17.
I installed containerd for run time and made all the required changes to configs via my own knowledge / website’s instructions. I installed kubectl, kubelet, kubeadm the normal way.
I run kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.20 . I have my servers host names in /etc/hosts .
I can run kubectl version and get both client and server version
kubectl version
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
I can get pods with the following:
kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
kubemaster NotReady control-plane 12m v1.30.1
If I wait long enough i get the following:
kubectl get nodes -A
The connection to the server 192.168.1.20:6443 was refused - did you specify the right host or port?
The same thing happens if I try to add on network, if I add the network before I lose connection pods will start trying to build but eventually fail and then I lose connection to the server again:
The connection to the server 192.168.1.20:6443 was refused - did you specify the right host or port?
If I reboot I can get it started temporarily. Any ideas? Is there an issue with k8s 1.3.0 and containerd / systemd?
Here are all my steps for install...
On all Linux hosts
vim /etc/hosts on all hosts and add hostnames
ex:
192.168.1.20 kubemaster
192.168.1.21 kubenode01
192.168.1.22 kubenode02
• swap
o ensure swap is off and not configured – swapoff -a (current session); comment out swap in /etc/fstab if needed; ensure systemd.swap is not configured on
• selinux
o sudo setenforce 0 (current session);
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config (permanent)
• sudo systemctl disable firewalld, stop firewalld
sudo cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF
sudo cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
lsmod | grep br_netfilter
lsmod | grep overlay
sudo sysctl --system
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
Containerd Section
• setup Docker repo to obtain containerd (its only provided here via repo, otherwise you must install it by binary / manual processes)
sudo dnf install -y yum-utils (needed for yum-config-manager command)
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y containerd.io
o containerd runs in conjunction with crun/runc to provide additional functionality
o runc talks to containerd which then talks to k8s
sudo vim /etc/containerd/config.toml (the input below is required in file)
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
sudo systemctl enable containerd
sudo systemctl start containerd (only after first time, enable above survives reboot)
End of Containerd Section
• set repo for Kubernetes
# This creates or overwrites any existing configuration in /etc/yum.repos.d/kubernetes.repo
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.30/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
sudo dnf install -y kubelet kubeadm kubectl --disableexcludes=Kubernetes (installed 1.30.1-150500.1.1)
sudo systemctl enable --now kubelet
confirmed crun / runc were container runtimes installed on Oracle Linux 9 (crun default)
Oracle Linux with podman uses crun by default and has runc installed as well; you can change to runc with podman --runtime runc and reverse with podman --runtime crun
crun -v or runc -v will show the version of the runtime
podman info --format {{.Host.CgroupsVersion}}
podman info | grep -i cgroup (confirms both systemd and version2 for cgroup)
Start of installing cluster
On Master Node Only
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.1.20
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
I tried weave, calico (tigera operator). But I don't think this is the problem as I never get the master node up to begin with.
kubectl apply -f [podnetwork].yaml
https://kubernetes.io/docs/concepts/cluster-administration/addons/
save the below kubeadm join command for after you add the network add-on, you will need it to join the worker nodes to your cluster
kubeadm join 192.168.1.20:6443 --token aohddn.cfxeu9fxy5yltmce \
--discovery-token-ca-cert-hash sha256:daf5a35296fb0d852be3e9defc1c7cdb59337b1f0f9bdb2df83ad442abd38531
I also forgot to ask, can’t remember but shouldn’t the k8s initial cluster come up prior to applying network add-on or will it hang until the network add-on is applied?
Almost certainly, your problem is related to containerd configuration. The CKA tutorials on installing with kubeadm are you friend; the Node Setup page covers a procedure that works pretty well, and should work on your VMWare virtuals just fine:
{
sudo mkdir -p /etc/containerd
containerd config default | sed 's/SystemdCgroup = false/SystemdCgroup = true/' | sudo tee /etc/containerd/config.toml
}
I added my install steps, this is one of them. Do you see a problem with my config.toml?
[sudo vim /etc/containerd/config.toml (the input below is required in file)
[plugins.“io.containerd.grpc.v1.cri”.containerd.runtimes.runc]
[plugins.“io.containerd.grpc.v1.cri”.containerd.runtimes.runc.options]
SystemdCgroup = true
Ive used this in prior k8s installs with no issue.
Also, since its been a minute. Do I have to install the network add on for the master nodes to come up properly? Or do I wait for the initial master nodes to come up prior to adding network add on like Calico etc.?
Are you talking about the entire tutorial or just the config file?
I wish the courses would cover more scenarios i.e. not just Ubuntu on VirtualBox. Most government guys have to use RedHat or a variant thereof. I have no issues running this in VirtualBox but he also uses a Vagrant file and I have no idea what all is configured differently because that is a massive file and I wouldn’t know what was specific to Vagrant v/s VirtualBox v/s k8s.
I know these are self paced but troubleshooting is very difficult in a back and forth v/s one on one scenario.
Also, the course steps don’t match exactly with the k8s documentation. I went with the k8s documentation since the versions are different as well.
Also the config.toml comes from the k8s page not the tutorial, he just points you to the page.
Im calling it a night. Will hit it again tomorrow with fresh eyes.
He actually covers multiple virtualization solutions, although it’s true we don’t do a RH flavored solution. The problem is that right now, support for RH family distributions is very hard. Vagrant, which is the easiest tool around for managing virtual systems, has a real lack of good “boxes” for anything that RH is promoting. The situation is so bad that in courses where we’ve traditionally used CentOS, we’re looking seriously at switching to Debian family distributions. RH has made their supported and upstream distributions such a mess that we almost don’t have a choice; IBM has made it for us If RH played nicer with the OSS community, their distributions would be easier for developers to use with other tools. This is not a RH nor an IBM priority. Perhaps things will improve once the Hashicorp purchase by IBM goes forward, although many of us are pessimistic about that too: based on what’s happened to RH, it’s hard to be sanguine about Hashicorp’s future in that conglomerate.
Also, we’re aware of what’s in the docs, but we also see what students get right and students get wrong, and if the docs aren’t preventing users from getting into trouble, we go for what we can. Part of this is that kubeadm is a beast, and developer experience is not necessarily a priority for that team. So if we need better, we make better.
So I do heartily recommend the tutorial. We’ve put a fair amount of engineering time into getting something that actually works, and we see the benefit in student experience in the courses where kubeadm is featured.
I agree with the synopsis on RedHat and distros, not been easy over the past few years and since IBM purchased them. I also am worried where things will go with Hashicorp products as well. Could be a serious problem for folks who need to learn without a big budget.
The problem with just going with the tutorial is I would just learn to follow instructions. That’s great for initial basic understanding of environment and tools. But, if I want to apply this in a work environment I have to be able to troubleshoot, which I am working on now.
As with any online research of technical issues, you sometimes get a quick really good answer or you have to go through mounds of answers that show up in your search that don’t answer the mail.
I appreciate the difficulty of trying to make a course on changing technology, I don’t hold the instructor in any way responsible for that. I am just trying to get some assistance where possible on making it work in my environment and gaining further in depth knowledge on the subject.
If that is not possible from this community due to lack of resources, then it is what it is. I am new to kodekloud and am learning what support is available given the resources available. If kodekloud doesn’t have the personnel to go in depth on troubleshooting outside the scope of the class I understand.
Am I correct to assume the instructors that make the courses aren’t available for one on one troubleshooting session with students? Are the moderators such as yourself not fully trained on each course or are but don’t have the time to do so. Again, no shade, just gaining knowledge on what support I know is available and what questions I know are appropriate for the community v/s what I know I will have to run down on my own via Google, reddit etc.
Thanks for your timely responses and all your help so far. I really do appreciate it.
Also, if I get the environment to work I will post a response with the instructions for others trying in a similar environment.
It’s very rare that I deal directly with the course instructors, many of whom are contractors, or just too busy with the work of creating the courses and recording them.
I’d be curious what problems are specific to RH family distributions. TBH I haven’t experimented much with the Oracle distributions; if they work with Vagrant, they’d be an option for us. But your particular combination of tools does limit who you can collaborate with, since the VMWare/Oracle combination isn’t that widely used.
Yeah, unfortunately. I am troubleshooting now and I think I am possibly close to something. Will post if I am right.
I used Centos until that support dropped. Not sure why Oracle Linux is still staying up to date i.e. who’s behind it vs Centos in the past. If I remember correctly, Oracle bought RedHat then IBM bought it. Then IBM dropped support for Centos. I am a little vague on it, just know that my last employer was using it as our RedHat variant and I have stuck with it. There was Scientific Linux and maybe another but I haven’t kept up with it since Oracle Linux has done the trick so far.
I think its a one for one like Centos but not 100% sure on that.
Thanks for clarifying on instructor access.
I still can’t get it to work, it starts up but eventually fails. I think its the conf.toml file. I did what the instructor stated in the course as well as suggestions online but no luck. I am pasting it below - I tried with just the three lines (last in file) as well as un-commenting the containerd lines that were in it already. Neither worked.
kubectl cluster-info
Kubernetes control plane is running at https://192.168.1.20:6443
CoreDNS is running at https://192.168.1.20:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use ‘kubectl cluster-info dump’.
[brian@kubemaster ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady control-plane 2m50s v1.30.1
[brian@kubemaster ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubemaster NotReady control-plane 5m7s v1.30.1
[brian@kubemaster ~]$ kubectl get nodes
The connection to the server 192.168.1.20:6443 was refused - did you specify the right host or port?
root = “/var/lib/containerd”
state = “/run/containerd”
subreaper = true
oom_score = 0
[grpc]
address = “/run/containerd/containerd.sock”
uid = 0
gid = 0
[debug]
address = “/run/containerd/debug.sock”
uid = 0
gid = 0
level = “info”
[plugins.“io.containerd.grpc.v1.cri”.containerd.runtimes.runc]
[plugins.“io.containerd.grpc.v1.cri”.containerd.runtimes.runc.options]
SystemdCgroup = true
One difference you’ll perhaps encounter is that you might have to deal with SELinux rather than AppArmor. You might want to start with a tutorial for dealing with CentOS 8 and try their SELinux/Podman related install, to see if that works better.
Almost identical to what I did minus the config.toml not mentioned.
I have another issue but not sure how to solve, it might be the entire culprit.
You can also perform this action in beforehand using ‘kubeadm config images pull’
W0601 15:06:31.870363 2598 checks.go:844] detected that the sandbox image “registry.k8s.io/pause:3.6” of the container runtime is inconsistent with that used by kubeadm. It is recommended to use “registry.k8s.io/pause:3.9” as the CRI sandbox image.
I haven’t found valid instructions yet on how to fix this. The one suggestion that sounded promising mention config files in a directory that I don’t have. It could be that they don’t get created until all is well. Have you seen this before? The pause:3.9 should be 3.6 for kubeadm or the pause:3.6 should be 3.9 for container runtime so they match.
Well I got the containerd conf.toml file to finally take the pause:3.9 and show up properly with the config dump | grep sandbox command. Before it would continue saying it was still 3.6 or it would show both versions and not initialize properly.
That got me past the error of ( detected that the sandbox image “registry.k8s.io/pause:3.6” of the container runtime is inconsistent with that used by kubeadm.It is recommended to use “registry.k8s.io/pause:3.9” as the CRI sandbox image. )
Now I get to the error ( Waiting for a healthy API server. This can take up to 4m0s
[api-check] The API server is not healthy after 4m0.00136938s
Unfortunately, an error has occurred:
context deadline exceeded
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- ‘systemctl status kubelet’
- ‘journalctl -xeu kubelet’
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- ‘crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause’
Once you have found the failing container, you can inspect its logs with:
- ‘crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID’
error execution phase wait-control-plane: couldn’t initialize a Kubernetes cluster )
I double checked the kubelet configs and they match with what documentation says on k8s and github’s sites for it.
The conf.toml is configured exactly the way k8s doc states but who knows at this point.
version = 2
[plugins.“io.containerd.grpc.v1.cri”.containerd.runtimes.runc]
[plugins.“io.containerd.grpc.v1.cri”.containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins.“io.containerd.grpc.v1.cri”]
sandbox_image = “registry.k8s.io/pause:3.9”
Im calling a night.
I am going to stop this effort for now. I need to focus on next course CKA. I will revisit this down the road and see if I can figure out why it doesn’t work in my particular environment.
I don’t know if this should be marked solution (not found) or just closed. Please advise.
Let’s keep it open; you’ve done some good work here, and maybe someone else can build on it. It would be good if we had a RH-oriented approach to installing K8s using kubeadm.
I have done it successfully with k8s versions 1.23 and 1.26 using I believe both containerd and CRI-o in the past on Oracle Linux 8. I can post those at a later time once I confirm they still work.
For now I am using the course instructions for Vagrant, VirtualBox and Ubuntu so I can move on.
I am like a dog with a bone, I want to know why its not working. I also tried it with k8s 1.29 and 1.28. I think its more than just a simple config change. I got errors on pause image which went away when I changed it for containerd (can’t change it for kubeadm); but no success. Got errors with kubelet on all version as well. But I need to move on for now so I will revisit when time permits.
One update for anyone coming across this feed. The video is outdated and does not work if you follow it and the k8s doc instructions. If you go the the GitHub - kodekloudhub/certified-kubernetes-administrator-course: Certified Kubernetes Administrator - CKA Course and follow the updated instructions there for the vagrant / virtualbox instructions and use the instructions / commands provided there it works. Its also easier as you can just copy and paste his commands easily.