#100daysofSRE (Day 20): Simplifying Log Analysis with Linux awk
Command: Basic and Templates
Hi there!!! 👋
It’s the 20th day of the #100dayschallenge
, and today I will discuss the linux awk
command for log extraction and analysis in SRE.
Log files are a critical source of information for monitoring and troubleshooting these systems. However, analyzing and extracting useful information from these logs can be a challenging and time-consuming task. This is where “awk” comes in handy. With its powerful text processing capabilities, “awk” can help you filter, extract, and manipulate log data to identify patterns, troubleshoot issues, and optimize system performance.
So, I have planned the contents for next 100 days, and I will be posting one blog post each and everyday under the hashtag #100daysofSRE
. ✌️
I hope you tag along and share valuable feedback as I grow my knowledge and share my findings. 🙌
Basic Usage
- Print a specific column of a file: Use awk to print a specific column of a file. For example, to print the second column of a space-separated file, use the following command
$ awk '{print $2}' file.txt
- Print lines that match a specific pattern: Use awk to print only the lines that match a specific pattern. For example, to print only the lines that contain the word “error”
$ awk '/error/ {print}' file.txt
- Print lines that do not match a specific pattern: Use awk to print only the lines that do not match a specific pattern. For example, to print only the lines that do not contain the word “error”
$ awk '!/error/ {print}' file.txt
- Sum a specific file column: Use awk to sum a specific file column. For example, to sum the values in the third column of a space-separated file
$ awk '{sum += $3} END {print sum}' file.txt.
- Compute the average of a specific file column: Use awk to compute the average of a specific file column. For example, to compute the average values in the third column of a space-separated file
$ awk '{sum += $3; n++} END {print sum / n}' file.txt
- Count the number of lines in a file: Use awk to count the number of lines in a file. For example, to count the number of lines in a file
$ awk 'END {print NR}' file.txt
- Extract a specific range of lines from a file: Use awk to extract a specific range of lines. For example, to extract lines 10 to 20 from a file
$ awk 'NR>=10 && NR<=20 {print}' file.txt
- Print the longest line in a file: Use awk to print the longest line in a file. For example, to print the longest line in a file
$ awk '{ if (length > max) {max = length; longest = $0}} END {print longest}' file.txt
- Merge two files based on a common field: Use awk to merge two files based on a common field. For example, to merge two space-separated files based on the common field in the first column
$ awk 'FNR==NR {a[$1]=$2; next} {print $0, a[$1]}' file1.txt file2.txt
- Replace a specific field in a file: Use awk to replace a specific field in a file. For example, to replace the third field in a space-separated file with a new value
$ awk '{$3="new_value"} {print}' file.txt
Log extraction and Analysis
-
Extracting a specific field from a log file:
$ awk '{print $1}' /var/log/syslog
This command prints the first field of each line in the syslog file.
-
Filtering out specific lines from a log file:
$ awk '!/error/' /var/log/syslog
This command prints all lines from the syslog file except those that contain the word “error”.
-
Counting the number of occurrences of a specific pattern in a log file:
$ awk '/error/{count++} END{print count}' /var/log/syslog
This command counts the number of lines in the syslog file that contain the word “error”.
-
Summing a specific field in a log file:
$ awk '{sum+=$1} END{print sum}' /var/log/syslog
This command sums up the first field of each line in the syslog file and prints the total.
-
Filtering based on a range of values:
$ awk '$1 >= 100 && $1 <= 200 {print}' /var/log/syslog
This command prints all lines from the syslog file where the first field is between 100 and 200.
-
Formatting output:
$ awk '{printf "IP Address: %s, Port: %s\n", $1, $2}' /var/log/apache/access.log
This command prints the IP address and port number from the Apache access log file in a formatted way.
-
Calculating the average of a field:
$ awk '{sum+=$1} END{print sum/NR}' /var/log/syslog
This command calculates the average of the first field in the syslog file.
-
Sorting based on a specific field:
$ awk '{print $2, $1}' /var/log/auth.log | sort
This command prints the second and first fields of each line in the auth log file and sorts them alphabetically by the second field.
-
Joining log files based on a common field:
$ awk 'NR==FNR{a[$1]=$2 FS $3; next}{print $0, a[$1]}' file1 file2
This command joins two log files based on a common field (the first field in this example).
-
Extracting unique values from a specific field:
$ awk '{print $1}' /var/log/syslog | sort | uniq
-
Count the number of requests by status code
$ awk '{print $9} | sort | uniq -c' logfile.txt
This command will count the number of requests by status code. The
awk
command first extracts the status code from each line in the log file. Thesort
command sorts the status codes in ascending order, and theuniq -c
command counts the number of times each status code appears. -
Get the top 10 most popular URLs
$ awk '{print $5} | sort | uniq -c | sort -nr | head -10' logfile.txt
This command will get the top 10 most popular URLs. The
awk
command first extracts the URL from each line in the log file. Thesort
command sorts the URLs in ascending order, and theuniq -c
command counts the number of times each URL appears. Thesort -nr
command sorts the URLs in descending order by the number of requests, and thehead -10
command prints the first 10 lines. -
Get the list of all errors
$ awk '$9 ~ /ERROR/' logfile.txt
This command will get the list of all errors. The
awk
command uses the~
operator to match the status codeERROR
. -
Get the list of all warnings
$ awk '$9 ~ /WARNING/' logfile.txt
This command is similar to the previous command, but it gets the list of all warnings.
-
Get the list of all successful requests
$ awk '$9 ~ /200 OK/' logfile.txt
-
Get the list of all requests that took longer than 1 second
$ awk '$8 > 1' logfile.txt
This command will get the list of all requests that took longer than 1 second. The
awk
command uses the>
operator to compare the value of the8th
field to 1. -
Get the list of all requests from a specific IP address
$ awk '$1 == "192.168.1.1"' logfile.txt
This command will get the list of all requests from the IP address
192.168.1.1
. Theawk
command uses the==
operator to compare the value of the1st
field to192.168.1.1
. -
Get the list of all requests that were made on a specific date
$ awk '$4 == "2023-03-08"' logfile.txt
This command will get the list of all requests that were made on the date
2023-03-08
. Theawk
command uses the==
operator to compare the value of the4th
field to2023-03-08
. -
Get the list of all requests that were made between two dates
$ awk '$4 >= "2023-03-08" && $4 <= "2023-03-10"' logfile.txt
This command will get the list of all requests that were made between the dates
2023-03-08
and2023-03-10
. Theawk
command uses the>==
and<==
operators to compare the value of the4th
field to the two dates. -
Get the list of all requests that were made by a specific user
$ awk '$3 == "johndoe"' logfile.txt
This command prints the first field of each line in the syslog file, sorts them, and then outputs only the unique values.
References
Thank you for reading my blog post! 🙏
If you enjoyed it and would like to stay updated on my latest content and plans for next week, be sure to subscribe to my newsletter on Substack. 👇
Once a week, I’ll be sharing the latest weekly updates on my published articles, along with other news, content and resources. Enter your email below to subscribe and join the conversation for Free! ✍️
I am also writing on Medium. You can follow me here.
Leave a comment