Deleting Duplicate Lines in Bash
Duplicate entries can cause a variety of problems in Bash scripts, such as incorrect or inconsistent results, and they can also make the script difficult to maintain. Removing duplicate entries from a script is often necessary to avoid these problems, and there are many ways to do this in Bash.
Remove Duplicate Lines in Bash Using sort and uniq
One way to remove duplicate entries in a Bash script is to use the sort and uniq commands. The sort command sorts the input data into a specified order, and the uniq command filters out duplicate lines from the sorted data.
The data.txt file contains the following content for this article's examples.
arg1
arg2
arg3
arg2
arg2
arg1
To remove duplicate entries from the above file, you can use the following command:
sort data.txt | uniq > data-unique.txt
Output (touch data-unique.txt):
arg1
arg2
arg3
This command sorts the data.txt file in ascending order (by default) and pipes the output to the uniq command . The uniq command filters out duplicate lines from the sorted data and writes the results to a new file named data-unique.txt.
This will remove all duplicate entries from the data.txt file and create a new file containing unique entries.
The uniq command has several options that can be used to control its behavior, such as the -d option to print only duplicate lines, or the -c option to print the number of times each line appears in the input. For example, to print the number of times each line appears in the data.txt file, you can use the following command:
sort data.txt | uniq -c
This command is similar to the previous one, but adds the -c option to uniq
the command. This prints the number of times each line appears in the input along with the line itself.
For example, the results might look like this:
2 arg1
3 arg2
1 arg3
This output shows that line 1 occurs.
Remove Duplicate Lines in Bash using awk Command
Another way to remove duplicate entries in a Bash script is to use the awk command, which is a powerful text processing tool that can perform a variety of operations on text files. awk
The command has a built-in associative array data structure that can store and count the number of occurrences of each line in the input.
For example, to remove duplicate entries from the same file as before, you can use the following command:
awk '!a[$0]++' data.txt > data-unique.txt
Output:
arg1
arg2
arg3
This command uses the awk command to read the data.txt file and applies a simple condition to each input line. The condition uses !a[$0]++
the expression, which increments the value of the a array for each line read.
This effectively counts the number of times each line occurs in the input and stores the counts in an array.
The awk command then applies !a[$0]
the operator of the expression, which negates the value of the array element. This means that only the rows in the array having a count of 0 will pass the condition and print to the output. The output is then redirected to a new file named data-unique.txt containing the unique entries from the data.txt file.
The awk command also provides several options and features that you can use to control its behavior and customize its output. For example, you can use the -F option to specify a different field separator or use the -v option to define variables in a script.
You can also use the printf function to format the output of the awk command in various ways.
The sort and uniq commands are simple but effective tools for removing duplicate entries, while the awk command provides more advanced features and options for customizing the output and behavior of your script.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
How to decompress x.tar.xz format files under Linux
Publish Date:2025/04/08 Views:186 Category:OPERATING SYSTEM
-
A lot of software found today is in the tar.xz format, which is a lossless data compression file format that uses the LZMA compression algorithm. Like gzip and bzip2, it supports multiple file compression, but the convention is not to compr
Summary of vim common commands
Publish Date:2025/04/08 Views:115 Category:OPERATING SYSTEM
-
In Linux, the best editor should be vim. However, the complex commands behind vim's powerful functions also make us daunted. Of course, these commands do not need to be memorized by rote. As long as you practice using vim more, you can reme
Detailed explanation of command return value $? in Linux
Publish Date:2025/04/08 Views:58 Category:OPERATING SYSTEM
-
? is a special variable. This variable represents the return value of the previous command. That is to say, when we run certain commands, these commands will return a code after running. Generally, if the command is successfully run, the re
Common judgment formulas for Linux script shell
Publish Date:2025/04/08 Views:159 Category:OPERATING SYSTEM
-
In shell script programming, predicates are often used. There are two ways to use predicates, one is to use test, and the other is to use []. Let's take a look at how to use these two methods through two simple examples. Example 1 # test –
Shell script programming practice - specify a directory to delete files
Publish Date:2025/04/08 Views:98 Category:OPERATING SYSTEM
-
Usually, in Linux system we need to frequently delete some temporary files or junk files. If we delete them one by one manually, it will be quite troublesome. I have also been learning shell script programming recently, so I tried to write
Use of Linux command at - set time to execute command only once
Publish Date:2025/04/08 Views:158 Category:OPERATING SYSTEM
-
This article mainly involves a knowledge point, which is the atd service. Similar to this service is the crond service. The functions of these two services can be similar to the two functional functions of javascript. Those who have learned
Use of Linux command crontab - loop execution of set commands
Publish Date:2025/04/08 Views:170 Category:OPERATING SYSTEM
-
Compared with at , which executes a command only once, crontab, which we are going to talk about in this article, executes the set commands in a loop. Similarly, the use of crontab requires the support of the crond service. The service is s
Linux practice - regularly delete files under the directory
Publish Date:2025/04/08 Views:198 Category:OPERATING SYSTEM
-
Since we want to delete the files under the directory regularly, we need to use the Linux crontab command. And the content format of each work routine is also introduced in the format of each crontab work. Similarly, we need to use shell sc
How to use the Linux file remote copy command scp
Publish Date:2025/04/08 Views:151 Category:OPERATING SYSTEM
-
Scp copies files between two hosts over the network, and the data is encrypted during transmission. Its underlying layer uses ssh for data transmission. And it has the same authentication mechanism and the same security level as ssh. When u