Docker

How Dockerfile Works

A Dockerfile is a text file that contains instructions on how to build a Docker image

At the heart of Docker's containerization process is the Dockerfile, a file that helps automate the creation of Docker images. In this blog post, we’ll take a detailed look at what a Dockerfile is and how it works. Let's get started!

What is a Dockerfile?

A Dockerfile is a text file that contains instructions on how to build a Docker image. Each instruction is composed of a command followed by one or more arguments. By convention, commands are written in uppercase to distinguish them from arguments and make the Dockerfile more readable.

Here is an example Dockerfile for a Node.js application:

FROM node:20.11.1
WORKDIR /app
COPY package.json /app
RUN npm install
COPY . /app
CMD ["node", "server.js"]

Here are the sequential tasks that are executed when building a Docker image from this Dockerfile:

Docker starts by looking for the base image specified in the FROM instruction (node:20.11.1) in the local cache. If it's not found locally, Docker fetches it from Docker Hub.
Next, Docker creates a working directory inside the container's filesystem as specified by the WORKDIR instruction (/app).
The COPY instruction copies package.json into the /app directory in the container. This is crucial for managing project dependencies.
Docker then executes the RUN npm install command to install the dependencies defined in package.json.
After the installation of dependencies, Docker copies the remaining project files into the /app directory with another COPY instruction.
Finally, the CMD instruction sets the default command to run inside the container (node server.js), which starts the application.

Want to learn more about building a Docker image using a Dockerfile? Check out this blog post: How to Build a Docker Image With Dockerfile From Scratch.

Common Dockerfile Instructions

Below, we discuss some of the most important commands commonly used in a Dockerfile:

FROM: Specifies the base image for subsequent instructions. Every Dockerfile must start with a FROM command.
ADD / COPY: Both commands enable the transfer of files from the host to the container’s filesystem. The ADD instruction is particularly useful when adding files from remote URLs or for the automatic extraction of compressed files from the local filesystem directly into the container's filesystem. Note that Docker recommends using COPY over ADD, especially when transferring local files.
WORKDIR: Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow it in the Dockerfile. If the specified directory does not exist, it’s created automatically.
RUN: Executes commands specified during the build step of the container. It can be used to install necessary packages, update existing packages, and create users and groups, among other system configuration tasks within the container.
CMD / ENTRYPOINT: Both provide default commands to be executed when a Docker image is run as a container. The main distinction is that the argument passed to the ENTRYPOINT command cannot be overridden, while the argument passed to the CMD command can.

For a comprehensive guide to all available Dockerfile instructions, refer to the official Docker documentation at Dockerfile reference.

Relationship Between Dockerfile Instructions and Docker Image Layers

Each instruction in a Dockerfile creates a new layer in the Docker image. These layers are stacked on top of each other, and each layer represents the change made from the layer below it. The most important point to note here is that Docker caches these layers to speed up subsequent builds (more on this in the next section).

As a general rule, any Dockerfile command that modifies the file system (such as FROM, RUN, and COPY) creates a new layer. Commands instructing how to build the image and run the container (such as WORKDIR, ENV, and ENTRYPOINT) add zero-byte-sized metadata layers to the created image.

To view the commands that create the image layers and the sizes they contribute to the Docker image, you can run the following command:

docker history <IMAGE_NAME>

You can also run the following command to find out the number of image layers:

docker inspect --format '{{json .RootFS.Layers}}' <IMAGE_NAME>

In this command, we use a Go template to extract the layers’ information.

For a deep dive into Docker image layers, check out our blog post:

What Are Docker Image Layers and How Do They Work?

Dockerfile and Build Cache

When you build a Docker image using the Dockerfile, Docker checks each instruction (layer) against its build cache. If a layer has not changed (meaning the instruction and its context are identical to a previous build), Docker uses the cached layer instead of executing the instruction again.

Let’s see this in action. Below is the output we get from building a sample Node app using the Dockerfile in the previous section:

From the screenshot above, the build process took 1244.2 seconds.

Building another Docker image (without making any changes to the application code or Dockerfile), the build time is drastically reduced to just 6.9 seconds, as shown below:

The significant decrease in build time for the second build demonstrates Docker's effective use of the build cache. Since there were no alterations in the Dockerfile instructions or the application code, Docker used the cached layers from the first build.

One more important point to note is that caching has a cascading effect. Once an instruction is modified, all subsequent instructions, even if unchanged, will be executed afresh because Docker can no longer guarantee their outcomes are the same as before.

This characteristic of Docker's caching mechanism has significant implications for the organization of instructions within a Dockerfile. In the upcoming section on Dockerfile best practices, we'll learn how to strategically order Dockerfile instructions to optimize build times.

Best Practices for Writing Dockerfiles

Below, we discuss three recommended best practices you should follow when writing Dockerfiles:

#1 Use a .dockerignore file

When writing Dockerfiles, ensure that only the files and folders required for your application are copied to the container’s filesystem. To help with this, create a .dockerignore file in the same directory as your Dockerfile. In this file, list all the files and directories that are unnecessary for building and running your application—similar to how you would use a .gitignore file to exclude files from a git repository.

Not including irrelevant files in the Docker build context helps to keep the image size small. Smaller images bring significant advantages: they require less time and bandwidth to download, occupy less storage space on disk, and consume less memory when loaded into a Docker container.

#2 Keep the number of image layers relatively small

Another best practice to follow while writing Dockerfiles is to keep the number of image layers as low as possible, as this directly impacts the startup time of the container. But how can we effectively reduce the number of image layers?

A simple method is to consolidate multiple RUN commands into a single command.

Let’s say we have a Dockerfile that contains three separate commands like these:

RUN apt-get update
RUN apt-get install -y nginx
RUN apt-get clean

This will result in three separate layers. However, by merging these commands into one, as shown below, we can reduce the number of layers from three to one.

RUN apt-get update && \
    apt-get install -y nginx && \
    apt-get clean

In this version, we use the && operator along with the \ for line continuation. The && operator executes commands sequentially, ensuring that each command is run only if the previous one succeeds. This approach is critical for maintaining the build's integrity by stopping the build if any command fails, thus preventing the creation of a defective image. The \ aids in breaking up long commands into more readable segments.

#3 Order Dockerfile instructions to leverage caching as much as possible

We know that Docker uses the build cache to try to avoid rebuilding any image layers that it has already built and that do not contain any noticeable changes. Due to this caching strategy, the order in which you organize instructions within your Dockerfile is important in determining the average duration of your build processes.

The best practice is to place instructions that are least likely to change towards the beginning and those that change more frequently towards the end of the Dockerfile.

This strategy is grounded in how Docker rebuilds images: Docker checks each instruction in sequence against its cache. If it encounters a change in an instruction, it cannot use the cache for this and all subsequent instructions. Instead, Docker rebuilds each layer from the point of change onwards.

Consider the Dockerfile below:

FROM node:20.11.1
WORKDIR /app
COPY . /app
RUN npm install
CMD ["node", "server.js"]

It works fine, but there is an issue. On line 3, we copy the entire directory (including the application code) into the container. Following this, on line 4, we install the dependencies. This setup has a significant drawback: any modifications to the application code lead to the invalidation of the cache starting from this point. As a result, dependencies are reinstalled with each build. This process is not only time-consuming but also unnecessary, considering that dependency updates occur less frequently than changes to the application code.

To better leverage Docker's cache, we can adjust our approach by initially copying only the package.json file to install dependencies, followed by copying the rest of the application code:

FROM node:20.11.1
WORKDIR /app
COPY package.json /app
RUN npm install
COPY . /app
CMD ["node", "server.js"]

This modification means that changes to the application code now only affect the cache from line 5 onwards. The installation of dependencies, happening before this, benefits from cache retention (unless there are changes to package.json), thus optimizing the build time.

Conclusion

In this blog post, we began by defining what a Dockerfile is, followed by a discussion of the most frequently used commands within a Dockerfile. We then explored the relationship between Dockerfile instructions and Docker image layers, as well as the concept of the build cache and how Docker employs it to improve build times. Lastly, we outlined three recommended best practices for writing Dockerfiles. With these insights, you now have the knowledge required to write efficient Dockerfiles.

Interested in learning more about Docker? Check out the following courses from KodeKloud:

Hemanta Sundaray

Mar 27, 2024 • 7 min read