JIYIK CN >

Current Location:Home > Learning > OPERATING SYSTEM >

Linux remove duplicates command uniq

Author:JIYIK Last Updated:2025/04/08 Views:

This article introduces the uniq command, which is also a member of the Linux pipe command family. Its main function is to remove duplicates.

Before introducing the uniq command, let's create a new file /tmp/uniq.txt that will be used in the following case. The content is as follows

$ cat /tmp/uniq.txt
alpha css web
cat linux command
error php function
hello world
onmpw web site
onmpw web site
wello web site
recruise page site
error php function
repeat no data
onmpw web site

Okay, now let's try it out.

$ uniq /tmp/uniq.txt
alpha css web
cat linux command
error php function
hello world
onmpw web site
wello web site
recruise page site
error php function
repeat no data
onmpw web site

Oops, we found that the result seems a bit abnormal. In the source file, there are two "error php function" lines and three "onmpw web site" lines, but in the result, "error php function" is not deduplicated, and only one "onmpw web site" line is deduplicated.

It should be noted here that by default, uniq will only retrieve adjacent duplicate data to remove duplicates. Although there are three "onmpw web site" in /tmp/uniq.txt, one of them is not adjacent to the other two, so only one is removed. The same is true for "error php function".

Given the above retrieval mechanism, uniq is generally used together with the sort command .

$ sort /tmp/uniq.txt | uniq
alpha css web
cat linux command
error php function
hello world
onmpw web site
recruise page site
repeat no data
wello web site

Now check to see if all duplicates have been removed.

OK, now that we have tried it out, let's start with a brief introduction to the options of the uniq command.

-c counts the number of repetitions of each line of data

$ sort /tmp/uniq.txt | uniq -c
1 alpha css web
1 cat linux command
2 error php function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site

We can see that "error php function" appears twice, and "onmpw web site" appears three times. The rest have no duplicates, so the value is 1.

-i Ignore case

Add a line of data "Error PHP function" in /tmp/uniq.txt

$ cat /tmp/uniq.txt
alpha css web
cat linux command
error php function
hello world
onmpw web site
onmpw web site
wello web site
Error PHP function
recruise page site
error php function
repeat no data
onmpw web site
$ sort /tmp/uniq.txt | uniq –c
1 alpha css web
1 cat linux command
2 error php function
1 Error PHP function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site

Let's look at the results. uniq is case-sensitive by default. Use -i to ignore case issues.

$ sort /tmp/uniq.txt | uniq –c –i
1 alpha css web
1 cat linux command
3 error php function
1 hello world
3 onmpw web site
1 recruise page site
1 repeat no data
1 wello web site

Now check if the case is ignored.

-u only outputs non-duplicate data

$ sort /tmp/sort.txt | uniq –iu
alpha css web
cat linux command
hello world
recruise page site
repeat no data
wello web site

Notice that neither “error php function” nor “onmpw web site” is output.

-w N means searching only N characters from the first character for duplicate detection.

$ sort /tmp/sort.txt | uniq –icw 2
1 alpha css web
1 cat linux command
3 error php function
1 hello world
3 onmpw web site
2 recruise page site
1 wello web site

In the results, we can see that "recruise page site" is also counted as 2, but it is only one line in the source file. And we can also find that "repeat no data" disappears. This is the effect of -w 2. Here we let uniq search only the first two characters. The first two characters of recruit and repeat are both re, so these two lines are also considered duplicates.

-f N means skipping the first N fields and searching for duplicate data starting from the N+1th field. Use space or tab as the delimiter.

$ sort /tmp/sort.txt | uniq –icf 2
1 alpha css web
1 cat linux command
3 error php function
1 hello world
4 onmpw web site
1 repeat no data
1 wello web site

We can see from the results that the first two fields are skipped and the duplicate detection starts from the third field. The third field of "recruise page site" and "onmpw web site" are the same, so they are considered to be the same data. However, we can see that "wello web site" and "onmpw web site" not only have the same third field, but also the second field. So why is it not counted as the duplicate data of "onmpw web site"? For this question, we have to go back to what we said before, uniq only detects whether adjacent data is duplicated.

To solve this problem, we need to work on the sort command. Remember the -k option of the sort command? Yes, we will use it to solve this problem.

$ sort –k 2 /tmp/uniq.txt | uniq –icf 2
1 alpha css web
1 cat linux command
1 repeat no data
1 recruise page site
3 error php function
4 onmpw web site
1 hello world

Let’s see if it’s solved.

-s N means skip the first N characters. We will not give examples of this option here. The usage of this option is similar to -f N. The only difference is that -f N skips the first N fields, while -s skips the first N characters.

-d only outputs the first data item with duplicates.

$ sort /tmp/uniq.txt | uniq -idw 2
repeat no data
error php function
onmpw web site

The result is only these three. Why is there a "repeat no data" data? Note the use of -w 2 here.

-D Output all duplicates

$ sort /tmp/uniq.txt | uniq –iDw 2
repeat no data
recruise page site
error php function
error php function
Error PHP function
onmpw web site
onmpw web site
onmpw web site

Well, all the commonly used commands about uniq options have been introduced. For more detailed information about uniq, you can use the command info uniq.

I hope this article is helpful to you.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Restart PostgreSQL in Ubuntu 18.04

Publish Date:2025/04/09 Views:72 Category:PostgreSQL

This short article shows how to restart PostgreSQL in Ubuntu. Restart PostgreSQL Server in Ubuntu You can restart Postgres server in Ubuntu using the following command. Order: sudo service postgres restart Sometimes the above command does n

Issues to note when installing Apache on Linux

Publish Date:2025/04/08 Views:78 Category:OPERATING SYSTEM

As the most commonly used web server, Apache can be used in most computer operating systems. As a free and open source Unix-like operating system, Linux and Apache are a golden pair. This article will introduce the installation and use of A

How to decompress x.tar.xz format files under Linux

Publish Date:2025/04/08 Views:186 Category:OPERATING SYSTEM

A lot of software found today is in the tar.xz format, which is a lossless data compression file format that uses the LZMA compression algorithm. Like gzip and bzip2, it supports multiple file compression, but the convention is not to compr

Summary of vim common commands

Publish Date:2025/04/08 Views:115 Category:OPERATING SYSTEM

In Linux, the best editor should be vim. However, the complex commands behind vim's powerful functions also make us daunted. Of course, these commands do not need to be memorized by rote. As long as you practice using vim more, you can reme

Detailed explanation of command return value $? in Linux

Publish Date:2025/04/08 Views:58 Category:OPERATING SYSTEM

? is a special variable. This variable represents the return value of the previous command. That is to say, when we run certain commands, these commands will return a code after running. Generally, if the command is successfully run, the re

Common judgment formulas for Linux script shell

Publish Date:2025/04/08 Views:159 Category:OPERATING SYSTEM

In shell script programming, predicates are often used. There are two ways to use predicates, one is to use test, and the other is to use []. Let's take a look at how to use these two methods through two simple examples. Example 1 # test –

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial