What is Puppet in DevOps?

Let’s talk about software deployment. Before we can deploy an application and make it available to its users, we must complete two steps:

First, we provision the infrastructure, which is the set of hardware & software components that support the application’s development, testing, and deployment. Provisioning means setting up servers, network equipment, and other infrastructure.

Second, we configure the infrastructure, which involves customizing the provisioned resources. Examples of tasks in this step include:

  • Installing software packages on a server
  • Updating to a specific Linux distribution
  • Setting up logging
  • Creating database configuration files

Configuring a handful of servers can be done manually or by using a script. But what if we have a complex infrastructure set up with hundreds or even thousands of servers?

In such a scenario, manual configuration will definitely be time-consuming & more often than not, will lead to errors that are costly & difficult to troubleshoot. Over time, configurations across servers would become inconsistent (known as “configuration drift”) due to human error.

To avoid such issues, we must adopt automation, which is in sync with the DevOps practice of “Automate Everything”. Even the infrastructure that applications run on. This is where Puppet comes into the picture.

What is Puppet?

Puppet is an open-source configuration management tool used to automate the management of servers at scale. Some examples of daily tasks that Puppet can automate include:

  • Installing software
  • Applying security patches
  • Modifying database settings

We use Puppet-specific code to write configuration files known as manifests, where we declare how we want our infrastructure configured. For example, the following Puppet code ensures that Nginx is installed on our server(s).

package { ‘nginx’:
ensure => installed
}

When we define and manage infrastructure through configuration files, we rely on a core principle of DevOps: Infrastructure as Code.

Infrastructure as Code

The core idea behind infrastructure as code (henceforth referred to as “IaC”) is that we manage infrastructure configuration through code (configuration files) rather than through manual processes.

These configuration files are then stored and tracked in version control systems (such as GitHub). This way, the entire history of the infrastructure is now captured in the commit log. This becomes a powerful tool for debugging issues. Anytime a problem pops up, we can check the commit log and find out what changed in our infrastructure. Sometimes, you can resolve the problem simply by rolling back to a previous version until a fix is implemented.

As we can see, IaC brings many benefits:

  • Improved productivity: System administrators and operators no longer have to carry out manual configuration.
  • Improved reliability: As infrastructure configuration information is stored in configuration files, there is less chance of human error.

Now that we understand what Puppet is and what problems it solves let’s understand how Puppet works.

Also read: Understanding the Role of Infrastructure as Code in DevOps

How does Puppet work?

Puppet uses an agent/server model to configure the systems. The agent is referred to as the Puppet Agent & the server is referred to as the Puppet Server.

Puppet Agent needs to be installed on each system we want to manage/configure with Puppet. Each agent is responsible for:

  • Connecting securely to the Puppet Server to get the series of instructions in a file referred to as the Catalog File.
  • Performing operations from the Catalog File to get to the desired state.
  • Sending back the status to the Puppet Server.

The Puppet Server is responsible for:

  • Compiling the Catalog File for hosts based on system, configuration, manifest file, etc. A Catalog File is a Puppet program used to control the systems running the Puppet Agent. After processing the manifest file, the Puppet Server prepares the Catalog File based on the target platform.
  • Sending the Catalog File to Agents when they query the Server.
  • Storing information about the entire environment, such as host information, metadata such as authentication keys.
  • Gathering reports from each agent and preparing the overall report.

When using this agent/server model, the agent connects to the server and sends a bunch of facts that describe the computer to the server. The server then processes this information, generates the list of rules that need to be applied to the device, and sends this list back to the agent. The agent is then in charge of making necessary computer changes.

Puppet Design Philosophy

Puppet has three important characteristics:

1. Declarative

Puppet takes a declarative approach to configuration files that describe the desired state of infrastructure. Puppet then configures the infrastructure based on this defined state.

The Puppet Domain Specific Language (DSL) is a declarative language. In declarative language, we declare the state we want to achieve rather than the steps to get there. With the Puppet DSL, we describe the desired state of our systems, and Puppet handles all responsibility for making sure that the system conforms to this desired state.

2. Idempotent

Puppet as a language is designed to be inherently idempotent. An idempotent action can be performed over and over again. If the first run was successful, reapplying the same action a second or third time won't change the system; there will be no unintended side effects.

Furthermore, if a script is idempotent, it can fail halfway through its task and be run again without problematic consequences. For example, if, for some reason, Puppet fails halfway through a configuration run, re-invoking Puppet will complete the run and repair any configurations that were left in an inconsistent state by the previous run.

Most Puppet resources provide idempotent actions, and we can rest assured that two runs of the same set of rules will lead to the same end result.

3. Stateless

Puppet’s agent/server API is stateless. This means that there is no state being kept between runs of the agent.

Each Puppet run is independent of the previous one and the next one. Each time the Puppet agent runs, it collects the current facts. The Puppet master then generates the rules-based just on those facts, and then the agent applies them as necessary.

This stateless model has several advantages:

  • There is no need to synchronize data or resolve conflicts between masters. This allows Puppet servers to scale horizontally, which means that we can add more servers (or “nodes”) that each run the application.
  • Catalogs can be compared and cached locally, so that servers don’t need to exchange data about the current state all the time, reducing network traffic & server load.

Open source Puppet vs Puppet Enterprise

Puppet comes in two flavors: open-source Puppet & Puppet Enterprise.

Open-source Puppet is great for individuals managing a small set of servers. Puppet Enterprise is the commercial version. It builds on the core open-source projects, adding a whole set of powerful capabilities to manage complex workflows & automate enterprise-scale infrastructure.

Final Thoughts

Puppet is a cross-platform tool that has been around for a long time. It is mature, & more importantly, it is one of the industry's most popular infrastructure automation tools today. This makes it a must-have tool in a DevOps engineer’s toolkit.

Want to learn more about Infrastructure as Code (IaC)? check out our IaC Learning Path.

Want to learn more about Puppet? Watch this video.