Dear fellow learners,
I am testing a production-ready setup for a Kubernetes (k8s) cluster on-premises. I aim to follow best practices as closely as possible with my current cluster. I have a question regarding automated backups: What is the best way to back up my entire cluster?
To provide some context, my setup spans three different zones, with three master nodes and several other nodes distributed across those zones. In case of a disaster where I need to create a new Kubernetes cluster, I would like to restore the backup.
What is the recommended production-grade approach for doing this? I’m primarily looking for information on what tools or methods I should use, how they work, and any relevant insights—I can handle testing the setup myself afterward.
I truly appreciate your help on this matter!
Hi,
In my experience, I use etcdctl
with TLS to take secure etcd snapshots, automate the process with a cron job, and upload the backup files to a MinIO bucket. I recommend focusing on backing up the ETCD data files and deploying the Kubernetes cluster with either a stacked etcd cluster or an external etcd, depending on your setup. For YAML deployments, you should use GitOps and store the YAML files in repositories.
1 Like
I have 3 master with etcd on each master.
Hi,
Since you have 3 masters with etcd on each, you’re using an HA etcd cluster. You only need to back up etcd from one healthy master node. Make sure to store the snapshot safely. Also, it’s good practice to back up key Kubernetes config files from /etc/kubernetes/
for full recovery.
Simple python script work for this task or I have use a specific tool ? I mean just like we take snapshot of ETCD in our CKA exam lab. Then copy that file to safe location?
Yes, there are some commands you can use to take backups. You can write a Python script to run those commands within a Docker image for easy reuse and management. It’s similar to how you take ETCD snapshots in the CKA exam. You can also add a step to upload the snapshot to MinIO. Then, create another Docker image that can restore the ETCD snapshot and spin up a test cluster to verify it.