If you work with Kubernetes you already know that it's quite a complex and capable tool. There are literally thousands of different ways we can use it. So it can be a bit hard to keep up with everything going on in this space. Fortunately, an important conference revolving around Kubernetes took place this year: KubeCon/CloudNativeCon EU 2022. And we can extract a lot of value from it even if we didn't attend.
Here's an example: in this link, we can see a playlist containing presentations that took place at KubeCon. How could we get value from this? Let's imagine we've been thinking for some time about using a service mesh tool like Linkerd. But we're not quite sure what's what and we don't want to spend hours reading documentation. Well, we can just search for the text "Linkerd" in that collection of KubeCon presentations, in the playlist we just mentioned. And we'll find this video: Overview and State of Linkerd. Since the people hosting these talks are usually very experienced in their fields, we get a very condensed form of information. That is, we learn a lot in a very short amount of time since they share what they've learned after years of experience with a certain tool or procedure.
But now let's jump to the core of our article. A lot of subjects were covered at this conference: best practices, cool tools, philosophies, stories about how disasters unfolded and were solved, examples of how Kubernetes is used in big companies to solve big problems, and so on. It would be pretty hard to view all of these presentations; there are over 200 in that playlist! But if we read between the lines we can find some well-hidden treasure. And this can help us stay on top of our field. By taking a closer look at what happened at KubeCon EU 2022 we can guess the direction where the industry is heading. Otherwise said, we can see the industry trends. This way, we can start learning about these things earlier and gather some experience in advance. So it's a good way to ensure we remain relevant and valuable at our jobs in the coming years.
Here's how we can spot these top industry trends. If we take a look at the titles of these talks, we'll see some subjects appear again and again. For example, we keep seeing the word "GitOps" in titles like these:
- GitOps to Automate the Setup, Management and Extension a K8s Cluster
- Intro to Kubernetes, GitOps, and Observability Hands-On Tutorial
- GitOpsify Everything: When Crossplane Meets Argo CD
With so much focus on GitOps we can draw the conclusion that this subject is, or is becoming, important for the industry. After all, people were paying to attend this conference and hear these talks.
Following this logic, let's take a look at the most important industry trends and best practices signaled by KubeCon EU in 2022.
First, GitOps. We know that Git repositories are very useful for storing the code of some applications. For example, every developer can edit code on their laptop, then upload their latest work to some GitHub repository that all developers in our company share. And GitHub will track what this latest upload changed in that code, what remained the same, what was deleted. Now all of the other developers can easily see what the latest modifications are, and even who made them, so they can ask questions, comment on those changes, and so on. It makes collaboration straightforward. So Git is very useful at tracking changes, in a nutshell.
GitOps practices can do a similar thing for Kubernetes. We know how when we want to change something in our worker nodes, we use some .YAML files where we declare what structures we want in our Kubernetes infrastructure.
But let's say John modified something last night. Now Jane comes to work, and she's not really sure what changed. So she has to enter a bunch of commands to see the current state and check out what changed. Well, GitOps makes all of this easier.
With a GitOps approach, the current state becomes immediately visible to everyone. Imagine some GitHub-like repository at an address like example.com/kube-manifests/. And in there we have some files that contain detailed instructions about all the structures we want. We can configure our Kubernetes cluster to always follow the instructions it sees at example.com/kube-manifests/. This becomes the so-called "source of truth".
So now, instead of John manually feeding his personally edited YAML files to Kubernetes, he edits the files at example.com/kube-manifests/. Since everyone can see these files at any time, it's easy for each team member to see what changed, when, why, who changed it, and so on. Remember, this is a repository like something we'd see on GitHub. Every change is tracked and highlighted. We can easily see what was added, modified, removed, and so on. We're always up-to-date about the current state of our Kubernetes cluster.
There are many advantages to this approach. Here's one example. Imagine this scenario: the entire structure in our worker nodes suffers a catastrophic breakdown for some reason. We lose all the objects, resources, policies we declared, everything. So now we need to recreate them. Before GitOps, what would we do? Call our top administrators so that they can recreate the things they were maintaining. Maybe they take some YAML files they had on their computer and feed them to Kubernetes. Or maybe these were uploaded somewhere and they can be reinserted into Kubernetes. But there are multiple people involved and they need to coordinate. Who did what? Who needs to create the structures first? What objects depend on what objects? It will be time-consuming until the whole team can organize. But if we follow a GitOps approach, then this becomes ultra-simple. We recreate a healthy Kubernetes cluster. Next, we just reconfigure it to follow the state declared at example.com/kube-manifests/. Bam! Job done! Kubernetes will slowly reconfigure itself with all the objects it needs. We don't even have to make any changes to the files we have on that Git repository.
Or think of another cool feature of GitOps. Someone makes some changes to the state declared at example.com/kube-manifests/. But, suddenly, something doesn't work so well anymore. However, since every change is now neatly tracked, we basically get a magic undo button. We just go to our Git repository at example.com/kube-manifests/ and revert to a previous version that worked well before this incident. All of the files at example.com/kube-manifests/ are reverted to their previous form, Kubernetes sees the state it should revert to, starts doing its magic and problem solved, once again, with minimal effort!
If you're interested to know more about GitOps, just go through the presentations KubeCon posted on YouTube. The popular tools for Kubernetes GitOps at the moment are Flux and Argo CD.
Another subject that appeared in many presentations at KubeCon is "Chaos Engineering". But what does this term mean? Chaos Engineering is, at its core, the practice of purposely stressing out our infrastructure, or at least, small parts of our infrastructure. We use tools that can make various things fail or put pressure on some components. Seems a bit weird though. Why would we do such a thing?
First of all, by stressing out various parts of our infrastructure, we can see what breaks first. This way, we can identify weak points and make them stronger so that they don't break in the future.
Normally, a well-designed Kubernetes cluster should be able to recover from occasional failures and heal itself. But we are humans, we will overlook some things. Some parts won't be failure-resistant or some bugs in our apps will make things spiral out of control. With chaos engineering, we can hammer away at our infrastructure, but in a somewhat gentle and controlled way. For example, we could just stress and introduce small failures in just 5% of our entire structure. If everything keeps working perfectly, great! We have a failure-resistant cluster! But if something fails, it's also great! Our admins and developers can be made aware that if this or that happens, something fails spectacularly. And they can come up with a solution. So we find a problem spot and we only had to trigger this failure on 5% of our infrastructure. That's a nice trade-off. First of all, we can quickly fix it since we initiated the experiment, on purpose, and we'll be able to more easily identify what went wrong and why. Second of all, having just 5% of our stuff fail is much better than having 100% of it fail. Because if this happened naturally, either because some hackers attacked this vulnerability or users simply hit this bug unintentionally, we could have everything go up in flames and that would be much worse.
So this is how chaos engineering can save the day. Long story short, with chaos engineering implemented, in time, our Kubernetes cluster will become much more resistant to failures so our uptime should go up, keeping our customers happy.
The next hot topic at KubeCon revolved around service meshes. These are not new, but they are used more and more often.
With Kubernetes clusters becoming larger, and more complex, with many nodes and thousands upon thousands of pods, it becomes quite difficult to manage how these nodes and pods talk to each other. The networking part is becoming quite complex. Service meshes make it a lot easier to interconnect these components. But they don't only do that. They also make it easier to secure this communication, for example, by enabling encryption to prevent bad people from reading network traffic between pods and nodes. Or they can provide authentication, to make sure that pods are communicating with the real destination and not some malicious destination set up by a hacker to disrupt our structure. Authentication is basically a way for a node or a pod to prove that "Yes, I really am node05 or pod88, here is cryptographic proof!". Without authentication, things talking on a network basically have to blindly trust the device with IP address 10.0.5.9 that it is the real thing. But other than the IP address there's no assurance that the device there is the legitimate destination. And IP addresses can be easily faked.
It's also worth noting that service meshes make it easier to also log network traffic.
Long story short, if we have thousands of things we need to interconnect, service meshes can make life a bit easier, so it's quickly becoming an industry standard /slash/ best practice. In fact, we could say it's already almost a standard, as this process of adopting service meshes already began quite a long time ago.
Two popular solutions for implementing service meshes are Linkerd and Istio. Lately, people tend to prefer Linkerd more as they find it much easier to use. So Linkerd is easy to use, Istio is more complex but also has more features we can use, if we need them.
It's rather unfortunate that when people produce new things in the software world, security is usually the last concern. But it makes sense. Companies,and teams, are rushing to finish a product and deliver it to the world. Productivity is the main concern. It might take 3 months to finish a product. But it could take 5 months if security is also considered, as things have to be done differently, complicating the development stage a bit. So companies prefer to ignore security, ship the product first and worry about security later. But, lately, security incidents seem to happen more often. And they can be very costly, in terms of money, but also, reputation. Companies are beginning to realize that the time they shaved off by ignoring security might not be worth it. Like we saw in our example. They might have shaved off 2 months and delivered faster. But let's say they lost a hundred million dollars because of a destructive hack. They might realize that if they would have delayed the product launch to make their service more secure, they might have lost less money. And we can see this mentality shift at KubeCon EU 2022. A lot of talks and presentations focused on security-related subjects.
Here are a few examples:
Open Policy Agent (OPA) Intro & Deep Dive focused on an agent that lets us define and enforce certain rules to be followed in our cluster.
Falco to Pluginfinity and Beyond talks about a tool that can notify us when suspicious activities are detected in our Kubernetes clusters.
Full Mesh Encryption in Kubernetes with WireGuard and Calico where it's discussed how, in a way, we can create a sort of VPN for our cluster. And this ensures that network traffic between nodes and pods is much more secure.
If you're interested in how to improve security in your Kubernetes cluster, we actually have an entire course dedicated to this subject. And we covered a lot of interesting topics, maybe more than were covered at KubeCon. You can find this course here: Certified Kubernetes Security Specialist (CKS) If you use this coupon, you can check that out for free in the next 7 days. We believe that in those 7 days you'll have time to convince yourself that our courses are very easy to understand and actually interesting.
A lot of KubeCon talks in 2022 focused on autoscaling. But why?
Let's imagine we work at a big company. This company has hundreds of thousands of clients. And we use cloud services to host our Kubernetes cluster. With the help of our worker nodes and pods, we offer some kind of service to our clients. But the cloud service that hosts our Kubernetes cluster will cost a lot. We will need a ton of resources, servers, RAM, CPU power, and storage, to serve all of these clients. But the thing is, our clients are not active all the time. We might have 90.000 clients using our service at 9 PM, since they're all at home, in their free time. But we might have only 6000 clients using our service at 9 AM since they're busy, going to work. So it does not make sense to have hundreds of servers, sitting around, doing nothing, when they're not needed. This is why we can have Kubernetes autoscale. Some of you might already know what autoscaling is. But for those that don't know, here's a quick explanation:
Let's imagine it's 9 PM. A lot of customers rush in to use our service. We have autoscaling configured both in our cloud service and in our Kubernetes cluster. So 100 servers are added automatically to our infrastructure and Kubernetes auto-launches more pods to be able to serve all those clients. This is called horizontally scaling up. Time flies by, now it's 9 AM. Not much going on. Only 6000 clients are using our service. Autoscalers can automatically shut down those 100 extra servers so that we're not paying for things we don't need, things that aren't actively used. This is called horizontally scaling down. With those 100 servers removed we might save a few thousand or even tens of thousands of dollars each day. This can lead to monthly savings of hundreds of thousands of dollars or even millions for organizations that have super large infrastructures
Now let's get back to a typical Kubernetes cluster hosted in the cloud. Let's say we don't autoscale. We just have a fixed number of servers, nodes, and pods. We don't care about saving money. But even if we eliminate money from the equation, fixed, non-autoscaling infrastructures have another huge drawback.
So let's say we have 100 servers doing their job. At 9 PM, when a lot of customers use our service, they can deal with the huge traffic coming on. It's all good. Then something cool happens. Elon Musk really likes our product and he posts about it on Twitter. With the millions of followers that he has, we get a sudden burst of visitors, trying to check out our service. This should be a very good thing, but instead, it turns into a disaster. Our 100 servers can't take the pressure and they crack. Our entire service goes offline. Our old customers can't use the service anymore. And hundreds of thousands of people that wanted to check our product can't do that. They were all curious. A lot of them would have become happy customers of our service. But now, the opportunity is lost. If we had autoscaling implemented, hundreds of new servers could have been automatically added to our infrastructure. And we would have been able to support that extra traffic coming in. And this disaster would have been turned into a huge opportunity. We would have gained tens of thousands of new customers.
So we can see autoscaling can save the day when we get a sudden boost of users. And it can also save a serious amount of money when not much is going on. So it makes a lot of sense that autoscaling is getting more and more attention. In fact, we could say it's almost mandatory to implement this for all businesses that have a large number of customers.
We hope this article was useful and gave you a lot of ideas about things you might want to explore. If you're curious about any of these subjects, we recommend you search for relevant entries in the playlist containing KubeCon + CloudNativeCon Europe 2022 videos.
No Experience with Kubernetes? No Problem
Now we expect that most people reading this have some experience with Kubernetes. But just in case that's not true for you...
Maybe you noticed that more and more employers are looking for people with Kubernetes skills. But you're totally new to it. Well, in that case, you can check out this Kubernetes Course for Beginners.
Some Experience with Kubernetes, but Hard to Find Jobs?
If on the other hand, you do have some experience, but you're finding it hard to land jobs, maybe a certification could make your CV look much better? In that case, check out this course that helps you get ready for the CKA - Certified Kubernetes Administrator exam.