Ever heard terms like CI/CD - Continuous Integration and Continuous Delivery, pipelines, compiling, version control, containers and orchestration, and so on? They're pretty alien, right? Or at least, abstract. For example, we keep hearing about Kubernetes orchestrating containers. We can figure out that it probably automates something, but how, and why? And what actually happens in a Continuous Integration process?
In this article, we'll clear up the mystery surrounding these terms. And while we're at it, we'll also cover some of the essential tools and procedures that DevOps engineers use. More specifically, we'll look at why they use those tools; what kind of problems they solve with them.
Servers and Cloud Services
Usually, everything a DevOps person does will happen on some servers somewhere, or on some cloud services. This brings us to the first thing a DevOps engineer should master: Linux servers. Of course, Windows servers are also used occasionally, but we'll focus on the more popular choice, Linux.
A Linux server is almost always managed through commands. And this might seem a bit intimidating at first. But it's basically just us having a chat with the server and telling it what we want it to do.
For example, to create a new directory, called "JSCode", we would type:
To copy a file called "app.js" to that directory, we would type:
cp app.js JScode
We can see that these commands are derived from English words, where "mkdir" stands for "make directory" and "cp" stands for "copy". So learning all the right commands is almost like learning a new language, it takes time, but it's not complicated.
Other things a DevOps engineer must often interact with are our cloud services, offered by so-called, "cloud compute" service providers. A few examples of such providers are Amazon's AWS, Google Cloud, DigitalOcean, Linode, and Microsoft Azure. These companies provide hundreds of different components that we can interconnect to build something. For example, we could go on AWS and use a component that lets us store files, so we can upload our code there. And then we can add another component that lets us run that code. So instead of running it on our computer, we run it in the cloud, on that service/component we just launched. Next, we can add a third component that gives us access to a database, so our application can also store some data. It's basically like creating a robot, in that cloud, with various parts we can choose from. Except, it's not a robot, but more like a machine of sorts, built out of software and servers: the infrastructure that our company needs to run its business.
To launch and configure these components, we'll often use the cloud compute provider's website. We get access to their web interface and can use the mouse and keyboard to select what we want to launch and how we want to use it.
Automation Tools for Deploying and Configuring Servers
A company can use hundreds or thousands of servers. It quickly becomes a nightmare to have to log in to each one and configure it manually. Even if it's something simple, such as installing a program. Imagine we have to enter a command like:
sudo dnf install mariadb-server
We log in to server1, do this, and log out. Then we log in to server2, do this and log out. And so on, until we get to server1000. Would be a pretty boring day. And, realistically speaking, after we install that program we also have to configure it, which is even more time-consuming. So automation tools like Chef, Ansible, Puppet, and Terraform can perform all of those actions for us, on all 1000 servers, at once. What would require a human 50 hours to do manually can now happen in 3 minutes. And these automation tools can do much more. For example, if one server breaks down, these tools can automatically launch a new server, configure it according to our instructions and replace the one that just broke. So we can enjoy our sleep instead of having to wake up at 3 AM to fix a broken Linux installation.
The developers in our team write code every day. And this code is scattered around in many places, many files. So it's hard to keep track of who wrote what. Not to mention we wouldn't know what changed yesterday in our code, or today. But programs or services like Git, GitHub, GitLab, and others, make all of this easy.
For example, imagine we have this line somewhere, that one developer called John wrote:
x = 1
One month later, Jane comes along and upgrades the code, changing this line to:
x = 2
Source/Version control utilities keep a detailed history of such changes. It will track that John wrote "x = 1" and Jane changed this to "x = 2". Furthermore, each time we're happy with the code so far, we can create a snapshot. For example, after John wrote "x = 1" we can decide that all of our code is now ready to be released. And we can snapshot this and call it version 1.0.1 of our app. Then after Jane is done with her work, we make another snapshot and call it version 1.0.2. Now we can deliver both versions to our customers. When someone wants to download version 1.0.1 our version control utility can deliver the code, exactly how it was at that point in the past. For example, it will know that this older version of the code includes that line, "x = 1". And if someone wants to download version 1.0.2, it will know that that code has a different line, "x = 2". And it does this for thousands and thousands of lines, without us having to worry about it or manually make hundreds of copies of our code scattered around on our computers. Imagine a directory where we'd have files like "app1.0.1.zip" all the way to "app1.0.68.zip". Would be messy.
Version control also makes it easy to roll back changes whenever it's required. For example, if someone makes some accidental changes at 10:33 AM, we can revert all of our code to exactly the way it was at 10:32 AM.
Long story short, a tool like Git basically tracks absolutely everything that happened with the code our developers continuously write and change. So it makes it easy for many people to work on the same code, even if they change things in different places. Everyone knows what everyone else is doing, and even why since version control also lets people leave comments on why they made a particular change.
CI/CD: Continuous Integration and Continuous Delivery
Now that we have all of this valuable code written, it's time to make it usable. And this is what CI/CD does. Our code will go through what is called a pipeline. That's because it's basically like a pipe: information is transported through it, from one end to the other. And it's transformed in various ways along the way.
Here's a simplified example: after Jane modifies that line and changes it to "x = 2" our pipeline can do the following:
- Run some syntax tests to see if the code is still valid.
- Compile the code to transform it into a usable program. For example, all of this code, written in text form, can be compiled into a phone app, like an Android application. That's what compiling is, turning code into a final product like an app, or a website, or whatever it should be.
- Then our pipeline can even automatically deliver this phone app to the Google Play Store, or wherever it should land so that our customers can use it.
Of course, a real pipeline has many more steps. But in a nutshell, this is what CI/CD is about: automating all of these steps. This way we don't have to manually compile, or do other stuff, every time someone makes some changes to our code. And developers don't have to wait for a human being to process their changes. It happens fast, and automatically. So it saves a lot of time for both the DevOps engineer and the developers.
Some tools used for CI/CD pipelines are Jenkins, CircleCI, Travis CI, plus others.
With many people that worked on the same code, a common problem often appeared in the past. Whatever John modified, it worked on his laptop. But when Jane tried that same modified code, it did not work on her computer. That's because their installed programs and/or operating systems were configured in slightly different ways. So in developer teams, we'd often hear one person complaining "This doesn't work" and the other responding "Works for me…". This is just one of the many reasons why containers were invented. And a tool called Docker is by far the most popular container solution.
Nowadays code is tested inside such a container. We can think of a container as a sort of very, very small operating system in a box, configured specifically to support the app that we want to add inside it. It's not really an operating system, or at least, not a complete one, but it has some essential components that allow our code or app to run. And everyone can download this identical container, with the same essential components inside. So now it doesn't matter if we run that container on Ubuntu 20.04 or Ubuntu 22.04, it should work exactly the same way, for everyone. The code we run there does not care about our operating system anymore. It has everything it needs inside its own little box.
Another cool thing about a container is that we can move it anywhere we want, easily. John can just copy the container over to Jane so she can test it too. Nothing needs to be reinstalled or reconfigured. So containers make it very easy to clone a certain application and deliver it to many people. And we don't have to make different versions of our app, to be sure it works on Windows 8, Windows 10, Windows 11, Ubuntu 20.04, Ubuntu 22.04, and so on. Instead of building 5 variations of our app, one for each operating system, we just build one version that works everywhere. Code is easier to maintain when we only have one release. Also, it's easier for people to download and use our app since they get everything they need to be included in the container they just copied.
Containers also make it easier to run our app on the servers we have. This brings us to the next tool we should explore:
The container revolution started a secondary revolution: container orchestration. Now that we have apps squeezed inside these little boxes called containers, we can start to do cool things.
Let's imagine this. Someone wants to access their email. We could have one server that lets people access a website where they can read their emails. But if that server goes down, then thousands of people will lose access to this service. So we can do it another way. Every time someone wants to read their mail, we launch a small container, especially for them. We let them use our service, and when they log out, we just destroy that container. The benefit? If this container malfunctions, then we have just one client experiencing problems. With one container serving each client, as long as the other ones function correctly, everyone else's service will work just fine. This is a "microservice" architecture. We went from one big app on one big server, used by thousands of people, to each client being served by their own container.
And here we get to orchestration. We mentioned that if a container malfunctions, our client loses service. But with orchestration, that doesn't even happen. An orchestration tool will notice that the container is broken, so it will quickly destroy this defective one and launch a new one that works correctly. This is easy to do because containers are small and easy to create on-demand. The customer that wants to check their email will just notice that it took 2 extra seconds to access it. They won't realize that something malfunctioned and we quickly replaced the container they were using, behind the scenes. Pretty cool, isn't it? But something has to automate this process and that's the job of container orchestration tools such as Kubernetes.
With Kubernetes, something like this happens. We tell it about a state we want to achieve. For example, we can say, "Hey Kubernetes, I want 1000 of these containers to run on these 10 servers. And make sure to distribute them evenly". Then Kubernetes proceeds to configure and launch our containers. And even more, interestingly, it continuously monitors everything to make sure that this state is preserved. Here's a practical example. Say one container breaks and now we have 999 running instead of the desired 1000. Kubernetes will remove the broken one and recreate one that works correctly. But we already explored this, it's old news, we may think. Well, Kubernetes goes even further in its efforts to preserve that state. What if one of our servers breaks? That server could be running 100 of our containers. It's pretty bad to lose so many at once. But Kubernetes is smart enough to realize that the server is down, so it re-launches our 100 lost containers, on the other servers that still work. And it will distribute them evenly so no server is pressured too much. Furthermore, we can even offer spare servers to Kubernetes. And if it sees one went down, it can move those dead containers to our free, spare server.
So we can think of Kubernetes as a sort of artificial intelligence. We tell it how we want our containers to run. After that, Kubernetes continuously monitors everything and does its best to keep things running exactly how we wanted.
Hopefully, this makes things a little bit clearer about what DevOps people do. Of course, they use many other tools. For example, they also need software that monitors what's going on and sends alerts on errors. But in this article, we took a look at what they do with most of their time. The tools we explored here are usually the most important to know for someone aspiring to be a DevOps engineer.