JIYIK CN >

Current Location:Home > Learning > OPERATING SYSTEM >

Linux Practice - Log Filtering

Author:JIYIK Last Updated:2025/04/07 Views:

Linux classic practice - log filtering


Let's talk about the problem first. Count the number of IP addresses in a log file after deduplication. In fact, this is a very common and relatively simple problem. I personally think that the most important thing should be matching IP addresses. The rest is a matter of proficiency in Linux commands.
First of all, here I will talk about the command I used to solve this problem.

  • grepUsed to retrieve the IP address in the log file;
  • uniqUsed to deduplicate the retrieved IP addresses;
  • wcUsed to count the number of IP addresses;

Next we mainly introduce grephow to match the IP address

grepAs a frequently used command in Linux, it is a member of the pipeline command like the cut command. And its function is also to analyze a line of data and extract the data we want from the analyzed data. It is equivalent to a search function. Of course, grep is much more powerful than cut. Grep has a variety of search conditions, and can even cooperate with regular expressions to search. Here we mainly use its regular expressions.

Among them, we mainly use the following options of grep

  • -iCase-insensitive matching;
  • -oShow only matching strings
  • -ESpecifying a regular expression

For IP addresses, the value range is 0.0.0.0-255.255.255.255. So we can first match 0-255, and then repeat the regular expression four times.

Let's first look at the regular expression that matches 0-255.

(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))

Here we break down this regular expression. First, for the case where the first digit may be 2: 200-255. So we can use the regular expression: 2[0-5]{2}. Then for the matching 100-199case, the first digit must be judged whether it is 1. But here we have to discuss it separately. If it is three digits, it is 100-199; if it is two digits, it is a match 10-99, and if it is one digit, it is a match 0-9. So here the first digit 1 can be optional, and then the existence of 1 is used as a condition to judge whether it matches 100-199. So we also need a条件子组

Let's look at the regular expression

((1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))

If 1 is captured, the condition (?(\2)is true, then it is followed by yes [0-9]{2}; otherwise (?(\2), it is false else-pattern, and an optional path ([1-9][0-9]|[0-9])can be used to match 10-99or 0-9.

So in summary,

(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))

Then add a dot in front and repeat 3 times

(\.(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))){3}

Finally, the entire regular expression is

(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))(\.(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))){3}

By grepadding the above regular expression to match all IP addresses; then use uniqthe command to remove duplicates; and finally use it wcto perform statistics. The entire command is as follows

$ grep -ioE '(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))(\.(2[0-5]{2}|(1)?(?(\2)[0-9]{2}|([1-9][0-9]|[0-9])))){3}' | uniq -c | wc -l

The above problem is just an application of grep using regular expressions. There can be many variations of this problem, such as counting the number of IP address visits and then sorting them, finding the top ten IP addresses with the most visits, etc.

For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.

Article URL:

Related Articles

Restart PostgreSQL in Ubuntu 18.04

Publish Date:2025/04/09 Views:72 Category:PostgreSQL

This short article shows how to restart PostgreSQL in Ubuntu. Restart PostgreSQL Server in Ubuntu You can restart Postgres server in Ubuntu using the following command. Order: sudo service postgres restart Sometimes the above command does n

Issues to note when installing Apache on Linux

Publish Date:2025/04/08 Views:78 Category:OPERATING SYSTEM

As the most commonly used web server, Apache can be used in most computer operating systems. As a free and open source Unix-like operating system, Linux and Apache are a golden pair. This article will introduce the installation and use of A

How to decompress x.tar.xz format files under Linux

Publish Date:2025/04/08 Views:186 Category:OPERATING SYSTEM

A lot of software found today is in the tar.xz format, which is a lossless data compression file format that uses the LZMA compression algorithm. Like gzip and bzip2, it supports multiple file compression, but the convention is not to compr

Summary of vim common commands

Publish Date:2025/04/08 Views:115 Category:OPERATING SYSTEM

In Linux, the best editor should be vim. However, the complex commands behind vim's powerful functions also make us daunted. Of course, these commands do not need to be memorized by rote. As long as you practice using vim more, you can reme

Detailed explanation of command return value $? in Linux

Publish Date:2025/04/08 Views:58 Category:OPERATING SYSTEM

? is a special variable. This variable represents the return value of the previous command. That is to say, when we run certain commands, these commands will return a code after running. Generally, if the command is successfully run, the re

Common judgment formulas for Linux script shell

Publish Date:2025/04/08 Views:159 Category:OPERATING SYSTEM

In shell script programming, predicates are often used. There are two ways to use predicates, one is to use test, and the other is to use []. Let's take a look at how to use these two methods through two simple examples. Example 1 # test –

Scan to Read All Tech Tutorials

Social Media
  • https://www.github.com/onmpw
  • qq:1244347461

Recommended

Tags

Scan the Code
Easier Access Tutorial