JIYIK CN >

Current Location:Home > Learning > OPERATING SYSTEM >

Deleting Duplicate Lines in Bash

Author:JIYIK Last Updated:2025/03/23 Views:

Duplicate entries can cause a variety of problems in Bash scripts, such as incorrect or inconsistent results, and they can also make the script difficult to maintain. Removing duplicate entries from a script is often necessary to avoid these problems, and there are many ways to do this in Bash.


Remove Duplicate Lines in Bash Using sort and uniq

One way to remove duplicate entries in a Bash script is to use the sort and uniq commands. The sort command sorts the input data into a specified order, and the uniq command filters out duplicate lines from the sorted data.

The data.txt file contains the following content for this article's examples.

arg1
arg2
arg3
arg2
arg2
arg1

To remove duplicate entries from the above file, you can use the following command:

sort data.txt | uniq > data-unique.txt

Output (touch data-unique.txt):

arg1
arg2
arg3

This command sorts the data.txt file in ascending order (by default) and pipes the output to the uniq command . The uniq command filters out duplicate lines from the sorted data and writes the results to a new file named data-unique.txt.

This will remove all duplicate entries from the data.txt file and create a new file containing unique entries.

The uniq command has several options that can be used to control its behavior, such as the -d option to print only duplicate lines, or the -c option to print the number of times each line appears in the input. For example, to print the number of times each line appears in the data.txt file, you can use the following command:

sort data.txt | uniq -c

This command is similar to the previous one, but adds the -c option to uniqthe command. This prints the number of times each line appears in the input along with the line itself.

For example, the results might look like this:

2 arg1
3 arg2
1 arg3

This output shows that line 1 occurs.


Remove Duplicate Lines in Bash using awk Command

Another way to remove duplicate entries in a Bash script is to use the awk command, which is a powerful text processing tool that can perform a variety of operations on text files. awkThe command has a built-in associative array data structure that can store and count the number of occurrences of each line in the input.

For example, to remove duplicate entries from the same file as before, you can use the following command:

awk '!a[$0]++' data.txt > data-unique.txt

Output:

arg1
arg2
arg3

This command uses the awk command to read the data.txt file and applies a simple condition to each input line. The condition uses !a[$0]++the expression, which increments the value of the a array for each line read.

This effectively counts the number of times each line occurs in the input and stores the counts in an array.

The awk command then applies !a[$0]the operator of the expression, which negates the value of the array element. This means that only the rows in the array having a count of 0 will pass the condition and print to the output. The output is then redirected to a new file named data-unique.txt containing the unique entries from the data.txt file.

The awk command also provides several options and features that you can use to control its behavior and customize its output. For example, you can use the -F option to specify a different field separator or use the -v option to define variables in a script.

You can also use the printf function to format the output of the awk command in various ways.

The sort and uniq commands are simple but effective tools for removing duplicate entries, while the awk command provides more advanced features and options for customizing the output and behavior of your script.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Creating a Progress Bar in Bash

Publish Date:2025/03/23 Views:94 Category:OPERATING SYSTEM

A progress bar is a visual indicator that shows the progress of a task, such as a long-running script or command. It can be used to provide feedback to the user about the status of a task and can also help estimate the time remaining before

Count unique lines in a file in Linux

Publish Date:2025/03/23 Views:190 Category:OPERATING SYSTEM

Counting unique lines in a file is a common task in Linux, and there are a number of different tools and methods that can be used to perform this operation. In general, the appropriate method depends on the specific requirements and constra

Counting files in a directory in Bash

Publish Date:2025/03/23 Views:178 Category:OPERATING SYSTEM

Counting how many files are in a directory is a common task in Bash, and there are a number of different tools and methods that can be used to perform this operation. In general, the appropriate method depends on the specific requirements a

Execute commands in a variable in a Bash script

Publish Date:2025/03/23 Views:111 Category:OPERATING SYSTEM

This article is about storing Bash commands in a variable and then executing it directly from that variable. First, we will discuss the various ways to execute commands contained in a variable, followed by several script examples. Let’s g

Bash variable multiplication

Publish Date:2025/03/23 Views:150 Category:OPERATING SYSTEM

This article explains how to multiply two variables in Bash. Multiplying variables in Bash Multiplying two variables is a simple operation in Bash. We can use the arithmetic operator * to multiply two numbers in Bash. In Bash, multiplicatio

Bash md5sum command

Publish Date:2025/03/23 Views:116 Category:OPERATING SYSTEM

This article explains how to use the md5sum command in Bash. Bash md5sum command md5sum command prints the 32 character and 128 bit checksum of a given file. This command converts the file into a hash using the MD5 algorithm; the syntax of

Sorting Arrays in Bash

Publish Date:2025/03/23 Views:73 Category:OPERATING SYSTEM

Sorting an array is a very common task in any programming language. In Bash scripting, we can also accomplish this task in two different ways. The first one uses any sorting algorithm and the second one uses a built-in keyword in Bash scrip

Multidimensional arrays in Bash

Publish Date:2025/03/23 Views:68 Category:OPERATING SYSTEM

Multidimensional array is a very important element for any program. It is mainly used to create table view of data and many other purposes. This article demonstrates how to create a two-dimensional array. In addition, we will discuss the to

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial