7 minute read


Hi there!!! 👋

It’s the 20th day of the #100dayschallenge, and today I will discuss the linux awk command for log extraction and analysis in SRE.

Log files are a critical source of information for monitoring and troubleshooting these systems. However, analyzing and extracting useful information from these logs can be a challenging and time-consuming task. This is where “awk” comes in handy. With its powerful text processing capabilities, “awk” can help you filter, extract, and manipulate log data to identify patterns, troubleshoot issues, and optimize system performance.

So, I have planned the contents for next 100 days, and I will be posting one blog post each and everyday under the hashtag #100daysofSRE. ✌️

I hope you tag along and share valuable feedback as I grow my knowledge and share my findings. 🙌

Basic Usage

  1. Print a specific column of a file: Use awk to print a specific column of a file. For example, to print the second column of a space-separated file, use the following command
    $ awk '{print $2}' file.txt
    
  2. Print lines that match a specific pattern: Use awk to print only the lines that match a specific pattern. For example, to print only the lines that contain the word “error”
    $ awk '/error/ {print}' file.txt
    
  3. Print lines that do not match a specific pattern: Use awk to print only the lines that do not match a specific pattern. For example, to print only the lines that do not contain the word “error”
    $ awk '!/error/ {print}' file.txt
    
  4. Sum a specific file column: Use awk to sum a specific file column. For example, to sum the values in the third column of a space-separated file
    $ awk '{sum += $3} END {print sum}' file.txt.
    
  5. Compute the average of a specific file column: Use awk to compute the average of a specific file column. For example, to compute the average values in the third column of a space-separated file
    $ awk '{sum += $3; n++} END {print sum / n}' file.txt
    
  6. Count the number of lines in a file: Use awk to count the number of lines in a file. For example, to count the number of lines in a file
    $ awk 'END {print NR}' file.txt
    
  7. Extract a specific range of lines from a file: Use awk to extract a specific range of lines. For example, to extract lines 10 to 20 from a file
    $ awk 'NR>=10 && NR<=20 {print}' file.txt
    
  8. Print the longest line in a file: Use awk to print the longest line in a file. For example, to print the longest line in a file
    $ awk '{ if (length > max) {max = length; longest = $0}} END {print longest}' file.txt
    
  9. Merge two files based on a common field: Use awk to merge two files based on a common field. For example, to merge two space-separated files based on the common field in the first column
    $  awk 'FNR==NR {a[$1]=$2; next} {print $0, a[$1]}' file1.txt file2.txt
    
  10. Replace a specific field in a file: Use awk to replace a specific field in a file. For example, to replace the third field in a space-separated file with a new value
       $ awk '{$3="new_value"} {print}' file.txt
    

Log extraction and Analysis

  1. Extracting a specific field from a log file:

    $ awk '{print $1}' /var/log/syslog
    

    This command prints the first field of each line in the syslog file.

  2. Filtering out specific lines from a log file:

    $ awk '!/error/' /var/log/syslog
    

    This command prints all lines from the syslog file except those that contain the word “error”.

  3. Counting the number of occurrences of a specific pattern in a log file:

    $ awk '/error/{count++} END{print count}' /var/log/syslog
    

    This command counts the number of lines in the syslog file that contain the word “error”.

  4. Summing a specific field in a log file:

    $ awk '{sum+=$1} END{print sum}' /var/log/syslog
    

    This command sums up the first field of each line in the syslog file and prints the total.

  5. Filtering based on a range of values:

    $ awk '$1 >= 100 && $1 <= 200 {print}' /var/log/syslog
    

    This command prints all lines from the syslog file where the first field is between 100 and 200.

  6. Formatting output:

    $ awk '{printf "IP Address: %s, Port: %s\n", $1, $2}' /var/log/apache/access.log
    

    This command prints the IP address and port number from the Apache access log file in a formatted way.

  7. Calculating the average of a field:

    $ awk '{sum+=$1} END{print sum/NR}' /var/log/syslog
    

    This command calculates the average of the first field in the syslog file.

  8. Sorting based on a specific field:

    $ awk '{print $2, $1}' /var/log/auth.log | sort
    

    This command prints the second and first fields of each line in the auth log file and sorts them alphabetically by the second field.

  9. Joining log files based on a common field:

    $ awk 'NR==FNR{a[$1]=$2 FS $3; next}{print $0, a[$1]}' file1 file2
    

    This command joins two log files based on a common field (the first field in this example).

  10. Extracting unique values from a specific field:

       $ awk '{print $1}' /var/log/syslog | sort | uniq
    
  11. Count the number of requests by status code

       $ awk '{print $9} | sort | uniq -c' logfile.txt
    

    This command will count the number of requests by status code. The awk command first extracts the status code from each line in the log file. The sort command sorts the status codes in ascending order, and the uniq -c command counts the number of times each status code appears.

  12. Get the top 10 most popular URLs

    $ awk '{print $5} | sort | uniq -c | sort -nr | head -10' logfile.txt
    

    This command will get the top 10 most popular URLs. The awk command first extracts the URL from each line in the log file. The sort command sorts the URLs in ascending order, and the uniq -c command counts the number of times each URL appears. The sort -nr command sorts the URLs in descending order by the number of requests, and the head -10 command prints the first 10 lines.

  13. Get the list of all errors

    $ awk '$9 ~ /ERROR/' logfile.txt
    

    This command will get the list of all errors. The awk command uses the ~ operator to match the status code ERROR.

  14. Get the list of all warnings

    $ awk '$9 ~ /WARNING/' logfile.txt
    

    This command is similar to the previous command, but it gets the list of all warnings.

  15. Get the list of all successful requests

    $ awk '$9 ~ /200 OK/' logfile.txt
    
  16. Get the list of all requests that took longer than 1 second

    $ awk '$8 > 1' logfile.txt
    

    This command will get the list of all requests that took longer than 1 second. The awk command uses the > operator to compare the value of the 8th field to 1.

  17. Get the list of all requests from a specific IP address

    $ awk '$1 == "192.168.1.1"' logfile.txt
    

    This command will get the list of all requests from the IP address 192.168.1.1. The awk command uses the == operator to compare the value of the 1st field to 192.168.1.1.

  18. Get the list of all requests that were made on a specific date

    $ awk '$4 == "2023-03-08"' logfile.txt
    

    This command will get the list of all requests that were made on the date 2023-03-08. The awk command uses the == operator to compare the value of the 4th field to 2023-03-08.

  19. Get the list of all requests that were made between two dates

    $ awk '$4 >= "2023-03-08" && $4 <= "2023-03-10"' logfile.txt
    

    This command will get the list of all requests that were made between the dates 2023-03-08 and 2023-03-10. The awk command uses the >== and <== operators to compare the value of the 4th field to the two dates.

  20. Get the list of all requests that were made by a specific user

       $ awk '$3 == "johndoe"' logfile.txt
    

    This command prints the first field of each line in the syslog file, sorts them, and then outputs only the unique values.

References



Thank you for reading my blog post! 🙏

If you enjoyed it and would like to stay updated on my latest content and plans for next week, be sure to subscribe to my newsletter on Substack. 👇

Once a week, I’ll be sharing the latest weekly updates on my published articles, along with other news, content and resources. Enter your email below to subscribe and join the conversation for Free! ✍️

I am also writing on Medium. You can follow me here.

Leave a comment