Linux remove duplicates command uniq
This article introduces the uniq command, which is also a member of the Linux pipe command family. Its main function is to remove duplicates.
Before introducing the uniq command, let's create a new file /tmp/uniq.txt that will be used in the following case. The content is as follows
$ cat /tmp/uniq.txt
alpha css web
cat linux command
error php function
hello world
onmpw web site
onmpw web site
wello web site
recruise page site
error php function
repeat no data
onmpw web site
Okay, now let's try it out.
$ uniq /tmp/uniq.txt
alpha css web
cat linux command
error php function
hello world
onmpw web site
wello web site
recruise page site
error php function
repeat no data
onmpw web site
Oops, we found that the result seems a bit abnormal. In the source file, there are two "error php function" lines and three "onmpw web site" lines, but in the result, "error php function" is not deduplicated, and only one "onmpw web site" line is deduplicated.
It should be noted here that by default, uniq will only retrieve adjacent duplicate data to remove duplicates. Although there are three "onmpw web site" in /tmp/uniq.txt, one of them is not adjacent to the other two, so only one is removed. The same is true for "error php function".
Given the above retrieval mechanism, uniq is generally used together with the sort command .
$ sort /tmp/uniq.txt | uniq
alpha css web
cat linux command
error php function
hello world
onmpw web site
recruise page site
repeat no data
wello web site
Now check to see if all duplicates have been removed.
OK, now that we have tried it out, let's start with a brief introduction to the options of the uniq command.
-c counts the number of repetitions of each line of data
$ sort /tmp/uniq.txt | uniq -c
1 alpha css web
1 cat linux command
2 error php function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site
We can see that "error php function" appears twice, and "onmpw web site" appears three times. The rest have no duplicates, so the value is 1.
-i Ignore case
Add a line of data "Error PHP function" in /tmp/uniq.txt
$ cat /tmp/uniq.txt
alpha css web
cat linux command
error php function
hello world
onmpw web site
onmpw web site
wello web site
Error PHP function
recruise page site
error php function
repeat no data
onmpw web site
$ sort /tmp/uniq.txt | uniq –c
1 alpha css web
1 cat linux command
2 error php function
1 Error PHP function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site
Let's look at the results. uniq is case-sensitive by default. Use -i to ignore case issues.
$ sort /tmp/uniq.txt | uniq –c –i
1 alpha css web
1 cat linux command
3 error php function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site
Now check if the case is ignored.
-u only outputs non-duplicate data
$ sort /tmp/sort.txt | uniq –iu
alpha css web
cat linux command
hello world
recruise page site
repeat no data
wello web site
Notice that neither “error php function” nor “onmpw web site” is output.
-w N means searching only N characters from the first character for duplicate detection.
$ sort /tmp/sort.txt | uniq –icw 2
1 alpha css web
1 cat linux command
3 error php function
1 hello world
3 onmpw web site
2 recruise page site
1 wello web site
In the results, we can see that "recruise page site" is also counted as 2, but it is only one line in the source file. And we can also find that "repeat no data" disappears. This is the effect of -w 2. Here we let uniq search only the first two characters. The first two characters of recruit and repeat are both re, so these two lines are also considered duplicates.
-f N means skipping the first N fields and searching for duplicate data starting from the N+1th field. Use space or tab as the delimiter.
$ sort /tmp/sort.txt | uniq –icf 2
1 alpha css web
1 cat linux command
3 error php function
1 hello world
4 onmpw web site
1 repeat no data
1 wello web site
We can see from the results that the first two fields are skipped and the duplicate detection starts from the third field. The third field of "recruise page site" and "onmpw web site" are the same, so they are considered to be the same data. However, we can see that "wello web site" and "onmpw web site" not only have the same third field, but also the second field. So why is it not counted as the duplicate data of "onmpw web site"? For this question, we have to go back to what we said before, uniq only detects whether adjacent data is duplicated.
To solve this problem, we need to work on the sort command. Remember the -k option of the sort command? Yes, we will use it to solve this problem.
$ sort –k 2 /tmp/uniq.txt | uniq –icf 2
1 alpha css web
1 cat linux command
1 repeat no data
1 recruise page site
3 error php function
4 onmpw web site
1 hello world
Let’s see if it’s solved.
-s N means skip the first N characters. We will not give examples of this option here. The usage of this option is similar to -f N. The only difference is that -f N skips the first N fields, while -s skips the first N characters.
-d only outputs the first data item with duplicates.
$ sort /tmp/uniq.txt | uniq -idw 2
repeat no data
error php function
onmpw web site
The result is only these three. Why is there a "repeat no data" data? Note the use of -w 2 here.
-D Output all duplicates
$ sort /tmp/uniq.txt | uniq –iDw 2
repeat no data
recruise page site
error php function
error php function
Error PHP function
onmpw web site
onmpw web site
onmpw web site
Well, all the commonly used commands about uniq options have been introduced. For more detailed information about uniq, you can use the command info uniq.
I hope this article is helpful to you.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Restart PostgreSQL in Ubuntu 18.04
Publish Date:2025/04/09 Views:72 Category:PostgreSQL
-
This short article shows how to restart PostgreSQL in Ubuntu. Restart PostgreSQL Server in Ubuntu You can restart Postgres server in Ubuntu using the following command. Order: sudo service postgres restart Sometimes the above command does n
Issues to note when installing Apache on Linux
Publish Date:2025/04/08 Views:78 Category:OPERATING SYSTEM
-
As the most commonly used web server, Apache can be used in most computer operating systems. As a free and open source Unix-like operating system, Linux and Apache are a golden pair. This article will introduce the installation and use of A
How to decompress x.tar.xz format files under Linux
Publish Date:2025/04/08 Views:186 Category:OPERATING SYSTEM
-
A lot of software found today is in the tar.xz format, which is a lossless data compression file format that uses the LZMA compression algorithm. Like gzip and bzip2, it supports multiple file compression, but the convention is not to compr
Summary of vim common commands
Publish Date:2025/04/08 Views:115 Category:OPERATING SYSTEM
-
In Linux, the best editor should be vim. However, the complex commands behind vim's powerful functions also make us daunted. Of course, these commands do not need to be memorized by rote. As long as you practice using vim more, you can reme
Detailed explanation of command return value $? in Linux
Publish Date:2025/04/08 Views:58 Category:OPERATING SYSTEM
-
? is a special variable. This variable represents the return value of the previous command. That is to say, when we run certain commands, these commands will return a code after running. Generally, if the command is successfully run, the re
Common judgment formulas for Linux script shell
Publish Date:2025/04/08 Views:159 Category:OPERATING SYSTEM
-
In shell script programming, predicates are often used. There are two ways to use predicates, one is to use test, and the other is to use []. Let's take a look at how to use these two methods through two simple examples. Example 1 # test –
Shell script programming practice - specify a directory to delete files
Publish Date:2025/04/08 Views:98 Category:OPERATING SYSTEM
-
Usually, in Linux system we need to frequently delete some temporary files or junk files. If we delete them one by one manually, it will be quite troublesome. I have also been learning shell script programming recently, so I tried to write
Use of Linux command at - set time to execute command only once
Publish Date:2025/04/08 Views:158 Category:OPERATING SYSTEM
-
This article mainly involves a knowledge point, which is the atd service. Similar to this service is the crond service. The functions of these two services can be similar to the two functional functions of javascript. Those who have learned
Use of Linux command crontab - loop execution of set commands
Publish Date:2025/04/08 Views:170 Category:OPERATING SYSTEM
-
Compared with at , which executes a command only once, crontab, which we are going to talk about in this article, executes the set commands in a loop. Similarly, the use of crontab requires the support of the crond service. The service is s