Kubernetes Cluster deployment in HA

sysinfra · March 14, 2025, 11:04am

I recreated the cluster with containerd

sysinfra · March 19, 2025, 9:50am

Hi
I need help,

i have cluster with these configs

i have a multi control plan cluster with mentioned details

2 ha-proxies with keepalived configured

ha-proxy1: 172.29.28.46
ha-proxy2: 172.29.28.47
keepalive floating IP: 172.29.28.48

control-plan-nodes IPs

k8-master: 172.29.28.49
k8-master1: 172.29.28.50
k8-master2: 172.29.28.51

worker-node details

worker-node-1: 172.29.28.52

I have Nginx-pod deployed with service exposed at nodePort.

i am able access the nginx using node IP with service port

but unable to access it using Keepalived floating IP

I have also curled the nginx from ha-proxy using and got the successful output

what could be the reason for it?

raymond.baoly · March 19, 2025, 10:19am

You need to check the IP address of the HAProxy server and test whether it can forward requests to the Kubernetes cluster.

sysinfra · March 20, 2025, 5:47am

yes, it is able to forward the request, that I tested using telnet and curl.

I tested on same Nginx pod, i was able to get response from pod on HA-proxy

raymond.baoly · March 20, 2025, 8:45am

You should use curl to access HAProxy at 172.29.28.46 from worker-node-1 or a VM in the same network that can reach HAProxy. If it shows the NGINX page, then we can focus on the Keepalived service.

To check the Keepalived service, use systemctl to see if it’s running. After that, check the logs and share them here when you try to access the VIP.

sysinfra · March 20, 2025, 5:47pm

I achieved the end result by using nginx-ingress-controller

but stuck in webhook error

when I create ingress for any service to route traffic through ingress-controller, it gives error.

when I disable webhook by deleteing kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission, it works smoothly.

this was the error i faced

root@k8-master:/home/abdullah.naeem/nginx-web# kubectl apply -f ingress.yaml
*

Error from server (InternalError): error when creating “ingress.yaml”: Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: failed to call webhook: Post “https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s”: context

sysinfra · March 21, 2025, 7:13am

pods on same nodes are able to ping each other, but pods on different nodes can’t ping each other.

can you help me with this?

i am using calico

raymond.baoly · March 21, 2025, 9:40am

The pods cannot communicate across nodes, which is a common issue in Kubernetes. There are many possible reasons, but the most common one is networking.

Troubleshooting Steps:

Check the CNI Plugin – Ensure the cluster’s network plugin (like Calico, Flannel, or Cilium) is properly installed and running.
Verify Pod Network Configuration – Confirm that pods are assigned correct IPs and can reach each other within the cluster.
Inspect Firewall Rules – Ensure there are no firewall rules blocking pod-to-pod communication across nodes.
Validate Node-to-Node Connectivity – Confirm that nodes can reach each other over the pod network.
Check Kube-Proxy & Network Policies – Ensure there are no misconfigured network policies or kube-proxy issues preventing communication.

sysinfra · March 24, 2025, 5:37am

I have tried all methods

i reinstalled CNI (calico)
Pods have been assigned IPs from designated CIDR
Firewall is off/disabled
node-node connectivity is good
i deleted all network policies and created a new one which allows all traffic

what else could be the reason!

impt

Pods are unable to reach network
like apt update don’t work

pods on worker-node1 are pingable only from worker-node1
pods on worker-node2 are pingable only from worker-node2

BUT

pods on worker-node1 are not pingable only from worker-node2 and vice versa

raymond.baoly · March 24, 2025, 10:41am

You have reinstalled the CNI. Based on my experience, it’s best to restart the node before testing the network again.

sysinfra · March 24, 2025, 11:04am

let me do that as well. restarting of node

sysinfra · March 24, 2025, 6:43pm

Hi,

I think Issue got resolved after reboot!

sysinfra · March 24, 2025, 6:48pm

Thank you so much for help!
may god bless you!
have a nice day!

sysinfra · March 24, 2025, 7:19pm

Issue got resolved after changing encapsulation from VXLAN to IPIP

could you tell why VXLAN didn’t work?

raymond.baoly · March 25, 2025, 10:56am

I’m not sure about that. Some networking concepts are really deep, and while I understand them, I can’t explain them clearly. For detailed questions, it’s better to refer to a book on the topic.

sysinfra · April 11, 2025, 6:39am

Hi,
I created Cluster with mentioned details

2 HA-proxies (load balanced by Keeplive)
kube-apiserver LB
Nginx-ingress-controller LB
3 master nodes
2 Worker-nodes
CNI: Calico with IPIP encapsulation and BGP enabled
CRI: containerd

Major issue faced

Pod-Pod across node communication issue

Troubleshooting done

used following encapsulation methods
IPIP
IPIPCrossSubnet
Vxlan
VXLANCrossSubnet
None
1. None of them worked for cross-node pod communication issue
I checked
calico-node logs
calico-apiserver-logs
related services status using calicoctl
1. node status
1. node-to-node-mesh was established
2. I also changed it to “node-specific” and some other

2. ippools
3. subnets
4. etc

Issue

Bgppeers were not created

Solution

I had to create them manually with global status so my pod communication worked well.