Bash Scripting

How to Extract Bash Substring

Extracting substrings from a larger string is a fundamental operation in text manipulation, which is an essential skill in Bash scripting.

In this blog post, we'll explore three methods that you can use to extract substrings in Bash. We will first look at using Bash's built-in parameter expansion for this task. Next, we'll leverage the cut command. Finally, we'll use the substr function provided by awk, a versatile text processing utility. Let’s dive in!

Prerequisite

To try out the scripts in this blog post, you need access to a Bash shell. You also need a text editor, such as "nano" or "vim", which come pre-installed by default in many Unix-like operating systems.

For the purpose of this blog post, I'll be using KodeKloud’s Ubuntu playground, which lets you access a pre-installed Ubuntu operating system in just one click. Best of all, you won't need to go through the hassle of installing any additional software— everything you need is already set up and ready to use.

Create a Script File

Let’s start by creating a Bash script file named demo.sh. This is where we'll place and run the scripts we're about to write in the upcoming sections. To create it, run the following command:

touch /usr/local/bin/demo.sh

Note: While you're free to create the demo.sh file in any directory of your choice, we're placing it in the /usr/local/bin directory for a specific reason. In most Linux distributions, this directory is included in the system's command path. This means we can run our script without making it executable.

With the script file created, let’s move on to the next section.

Extract Substring in Bash Using Parameter Expansion

Parameter expansion is a powerful feature in Bash scripting that you can use to extract substrings from a larger string. Let's dive deeper into this.

Firstly, what’s parameter expansion? A parameter is the name of the variable whose value you want to substitute. The process of manipulating these parameters is called parameter expansion. This means you can retrieve and transform the value of a parameter, such as a string, using specific expressions.

One of these expressions is the ${variable:start:length} form, where variable is the string you want to extract from, start is the position where the substring begins, and length is how many characters the substring should contain. Note that string positions in Bash start at 0, not 1.

Here is a simple Bash script that demonstrates the process of extracting substrings from a string.

#!/bin/bash

# Declare a variable
name="John Doe"

# Extract the substring "John" 
firstName=${name:0:4}

# Print the substring "John"
echo "First name is: ${firstName}"

In this script, we first declare a variable called name and assign it the string John Doe. Next, we use ${name:0:4} to extract the substring John. The 0 means start at the beginning of the string, and the 4 means the length of the substring is 4 characters. After that, we print out the substring John to the console.

Now that we have our script ready, let's see it in action. First, we need to add it to the demo.sh script file. Run the following command to open demo.sh using the "nano" text editor:

nano /usr/local/bin/demo.sh

This will open the "nano" editor with a new, blank file called demo.sh. Now, add the above script to the editor. You can do this by simply copying and pasting the script into the "nano" editor.

Once you have pasted the script, you will need to save the changes. To do this, press ctrl + o. This will prompt you to confirm the file name to which the changes should be written. Just press "enter" to confirm. To exit the "nano" editor, press ctrl + x.

Now, run the script using the following command:

bash demo.sh

Upon running this script, you will see the text First name is: John printed out on the terminal:

Extracting a substring by specifying a negative start value

You can also extract substrings from the end of a string by specifying a negative start value. Let's understand this concept with an example.

Consider the following Bash script:

#!/bin/bash

# Declare a variable
name="John Doe"

# Extract the substring "Doe" by specifying a negative start value
lastName=${name: -3}

# Print the substring "Doe"
echo "Last name is: ${lastName}"

In this script, we declare a variable called name and assign it the string John Doe. This time, however, we use ${name: -3} to extract the substring Doe.

Take note of the space before -3. This space is crucial because it tells Bash that -3 is not a default value for when name is unset or null, but rather the start position from the end of the string for our substring extraction. Therefore, ${name: -3} means "start from the 3rd position from the end of the string and extract until the end". Following this, we print out the substring Doe to the console.

Note: Be careful when writing this kind of expression. If you omit the space and write ${name:-3} (without a space), Bash will interpret this as a parameter substitution. Here's how this expression behaves:

If the variable name is set and is not null: Bash will use the value of name which, in our example, is John Doe. So the expression ${name:-3} will result in John Doe.
If the variable name is unset or null: The value -3, placed after the colon, will be used as the default value. Thus, the expression ${name:-3} will result in -3.

So, always remember to include the space when specifying a negative start value for substring extraction in Bash. Otherwise, you may find yourself working with the default value syntax, which can lead to unexpected results.

Now, let's run the script. But first, we need to add it to the Bash script file. To do this, run the following command in your terminal to open the demo.sh file using the "nano" text editor:

nano /usr/local/bin/demo.sh

This will open the demo.sh file in the "nano" editor. Now, replace the existing script with the new one. You can do this by simply copying and pasting the script into the "nano" editor.

Now, you have updated your Bash script file and are ready to run the script. Back in your terminal, run the script using the following command:

bash demo.sh

After running this script, you will see the text Last name is: Doe printed out on the terminal:

Extract Substring in Bash Using the Cut Command

The cut command is another helpful tool in Bash for extracting substrings from a string. It uses a delimiter to separate the string into fields and then extracts the desired fields. The syntax is as follows:

cut -d '<delimiter>' -f <field_number>

In this syntax:

-d '<delimiter>': The -d flag sets the character that cut uses to divide the string into separate strings. <delimiter> is the specific character that you're using to split your string. It could be any character like a comma, a space, or a colon. You define it by putting the character in single quotes and passing it to the -d flag.
-f <field_number>: The -f flag tells cut which substring(s) you want to extract from your string, based on how it's divided by the delimiter. <field_number> is the index of the substring you want to extract.

Indexing starts from 1. For example, -f 1 extracts the first substring, -f 2 the second substring, and so on. You can specify a range of substrings using a hyphen -, like -f 1-3 to extract the first three substrings. You can also specify multiple substrings separated by commas, like -f 1,3 to extract the first and third substrings.

Consider the following example:

#!/bin/bash

# Declare a variable
name="John Doe"

# Extract the substring "John"
firstName=$(echo ${name} | cut -d ' ' -f 1)

# Print the substring "John"
echo "First name is: $firstName"

# Extract the substring "Doe"
lastName=$(echo ${name} | cut -d ' ' -f 2)

# Print the substring "Doe"
echo "Last name is: $lastName"

In this script, we first declare a variable called name with the value John Doe. We then use the cut command to extract substrings from the string stored in name, using a space ' ' as the delimiter.

The use of echo ${name} here is important: we're using it to pass the value of the name variable as input to the cut command. The pipe operator | is used to direct the output from `echo ${name} to cut. This way, cut receives the string John Doe and applies the specified delimiter and field options to it.

The -f 1 option in cut -d ' ' -f 1 tells cut to extract the first field, i.e., John. We store this in firstName and then print it.

Next, we use -f 2 in cut -d ' ' -f 2 to tell cut to extract the second field, i.e., Doe. We store this in lastName and print it.

Now, let's run the script. Just like before, we need to add it to the Bash script file. To do this, run the following command in your terminal to open the demo.sh file using the "nano" text editor:

nano /usr/local/bin/demo.sh

This will open the demo.sh file in the nano editor. Now, replace the existing script with the new one that extracts substrings using the cut command. You can do this by simply copying and pasting the script into the "nano" editor.

Now, you have updated your Bash script file and are ready to run the script. Back in your terminal, run the script using the following command:

bash demo.sh

After running this script, you will see the following output:

Extract Substring in Bash Using awk

awk is a versatile scripting language primarily used for data manipulation. It has a built-in function substr() that you can use to extract a substring starting at a specific character position and with a certain length.

The substr() function takes three arguments: the string, the start position, and the length of the substring. Here's an example:

#!/bin/bash

# Declare a variable
intro="Hello, my name is John Doe."

# Extract the substring
substring=$(echo ${intro} | awk '{print substr($0, 19, 8)}')

# Print the substring "John Doe"
echo "My name is: $substring"

In this script, we start by declaring a variable named intro with the value Hello, my name is John Doe.

We then use the awk command to extract a specific substring from the string stored in intro. Just like with cut, we're using echo ${intro} to pass the value of the intro variable to awk. The pipe operator | is used to direct the output from echo ${intro} to awk.

The part {print substr($0, 19, 8)} is where the real work happens. awk instructs to print a substring that begins at the 19th character of the input and is 8 characters long. $0 within the substr function represents the entire input line, or in our case, the entire intro string.

The output from awk is captured using the $(...) syntax and is stored in the variable substring, which we then print out to the terminal.

Now, let's run the script. Just like before, we need to add it to the Bash script file. To do this, run the following command in your terminal to open the demo.sh file using the "nano" text editor:

nano /usr/local/bin/demo.sh

This will open the demo.sh file in the "nano" editor. Now, replace the existing script with the new one that extracts substrings using negative values. You can do this by simply copying and pasting the script into the "nano" editor.

Now, you have updated your Bash script file and are ready to run the script. Back in your terminal, run the script using the following command:

bash demo.sh

Upon running this script, you will see the text My name is: John Doe printed out on the terminal:

Conclusion

In conclusion, we learned three powerful methods for extracting substrings in Bash: using parameter expansion, using the cut command, and using the substr function provided by the awk utility.

Usage tips

When you want to extract substrings out of strings that are relatively of short length, parameter expansion provides the most straightforward syntax in comparison to the other two. The only negative aspect of parameter expansion is that it doesn’t use delimiters, which can make it cumbersome to manually count the starting position for long strings. If you have long strings and want to make use of delimiters, then the cut command is the best choice, as the other two options don't accommodate delimiters.

It’s worth noting that even though awk's substr() function might seem limited in its direct handling of delimiters, awk itself is a powerful text-processing tool. It comes with a wide variety of functions and constructs that you can use along with the substr() function for more complex text manipulation requirements.

Looking to build a solid foundation in shell and Bash scripting or taking your existing skills to the next level? Check out these courses from KodeKloud:

Shell Scripts for Beginners: In this course, you'll dive into the practical world of Linux shell scripting. Regardless of your programming experience, you'll master fundamental scripting concepts such as variables, loops, and control logic. Throughout the course, you'll get plenty of hands-on experience using our comprehensive labs. Not only that, you'll also receive immediate feedback on your scripts, which will help you improve and refine them.
Advanced Bash Scripting: In this course, you'll start with fundamentals like variables, functions, and parameter expansions and then dive deeper into streams, input/output redirection, and command-line utilities like "awk" and "sed". You'll master arrays for data manipulation and storage and learn best practices to create robust scripts.