In this blog, we can see how to create and manage hard links in Linux.
To understand hard links and soft links we first have to learn some very basic things about filesystems.
Let’s imagine a Linux computer is shared by two users: alex and jane. Alex logs in with their own username and password, Jane logs in with her own username and password. This lets them use the same computer, but have different desktops, different program settings, and so on. Now Alex takes a picture of the family dog and saves it into /home/alex/Pictures/family_dog.jpg.
Let’s simulate a file like this.
echo "Picture of Milo the dog" > Pictures/family_dog.jpg
With this, we created a file at Pictures/family_dog.jpg and stored the text “Picture of Milo the dog” inside.
There’s a command on Linux that lets us see some interesting things about files and directories.
We’ll notice an Inode number. What is this?
Filesystems like xfs, ext4, and others, keep track of data with the help of inodes. Our picture might have blocks of data scattered all over the disk, but the inode remembers where all the pieces are stored. It also keeps track of metadata: things like permissions, when this data was last modified, last accessed, and so on. But it would be pretty inconvenient to tell your computer, “Hey, show me inode 52946177”. So we work with files instead, the one called family_dog.jpg in this case. The file points to the inode, and the inode points to all the blocks of data that we require.
And we finally get to what interests us here.
We notice this in the output of our stat command.
There’s already one link to our Inode? Yes there is. When we create a file, something like this happens:
We tell Linux, “Hey save this data under this filename: family_dog.jpg”
Linux says: “Ok, will group all this file’s data under inode 51221169. Data blocks and inode created. Will hardlink file “family_dog.jpg” to Inode 51221169.
Now when we want to read the file:
“Hey Linux, give me data for family_dog.jpg file”
“Ok, let me see what inode this links to. Here’s all data you requested for inode 51221169”
family_dog.jpg -> Inode 51221169
Easy to understand. But why would we need more than one hard link for this data?
Well, Jane has her own folder of pictures, at /home/jane/Pictures. How could Alex share this picture with Jane? The easy answer, just copy /home/alex/Pictures/family_dog.jpg to /home/jane/Pictures/family_dog.jpg. No problem, right? But now imagine we have to do this for 5000 pictures. We would have to store 20GB of data twice. Why use 40GB of data when we could use just 20GB? So how can we do that?
Instead of copying /home/alex/Pictures/family_dog.jpg to /home/jane/Pictures/family_dog.jpg, we could hardlink it to /home/jane/Pictures/family_dog.jpg.
The syntax of the command is:
ln path_to_target_file path_to_link_file
The target_file is the file you want to link with. The link_file is simply the name of this new hard link we create. Technically, the hard link created at the destination is a file like any other. The only special thing about it is that instead of pointing to a new inode, it points to the same inode as the target_file.
In our imaginary scenario, we would use a command like:
ln /home/alex/Pictures/family_dog.jpg /home/jane/Pictures/family_dog.jpg
Or, if we’re already inside the /home/alex directory (that’s our current/working directory) we can use a relative path to our target file:
ln Pictures/family_dog.jpg /home/jane/Pictures/family_dog.jpg
Now our picture is only stored once, but the same data can be accessed at different locations, through different filenames.
Another beautiful thing about hard links is this: Alex and Jane share the same 5000 pictures through hardlinks. But maybe Alex decides to delete his hardlink of /home/alex/Pictures/family_dog.jpg. What will happen with Jane’s picture? Nothing, she’ll still have access to that data. Why? Because the inode still has 1 hard link to it (it had 2, now it has 1). But if Jane also decides to delete her hard link /home/jane/Pictures/family_dog.jpg, the inode will have 0 links to it. When there are 0 links, the data itself will be erased from the disk.
The beauty of this approach is that people that share hard links can freely delete what they want, without having a negative impact on other users that still need that data. But once everyone deletes their hard links to that data, the data itself will be erased. So data is “intelligently removed” only when EVERYONE involved decides they don’t need it anymore.
Limitations of hard links:
- You can only hardlink to files, not directories.
- You can only hardlink to files on the same filesystem. If you had an external drive mounted at /mnt/Backups, you would not be able to hardlink a file from your SSD, at /home/alex/file to some other file on /mnt/Backups since that’s a different filesystem.
Things to take into consideration when you hardlink:
First, make sure that you have the proper permissions to create the link file at the destination. In our case, we need write permissions at: /home/jane/Pictures/.
Second, when you hardlink a file, make sure that all users involved have the required permissions to access that file. For Alex and Jane, this might mean that we might have to add both their usernames to the same group, for example, “family”. Then we’d use a command to let the group called “family” read and write to this file. You only need to change permissions on one of the hardlinks. That’s because you are actually changing permissions stored by the Inode. So once you change permissions at /home/alex/Pictures/family_dog.jpg, /home/jane/Pictures/family_dog.jpg and all other hard links will show the same new sets of permissions.
To learn more about Linux checkout our hands-on Linux course here
To get free hands-on experience on System Administration and DevOps tasks please check out our KodeKloud Engineer program here