#100daysofSRE (Day 19): Simplifying Log Analysis with Linux Sed Command: Basic and Templates
Hi there!!! 👋
It’s the 19th day of the #100dayschallenge
, and today I will discuss the usage of a popular linux command sed
for log extraction and analysis in SRE.
Log files are critical to system and application monitoring, allowing admins to quickly identify and troubleshoot issues. However, these logs can be overwhelming and challenging to analyze, especially when dealing with large-scale environments.
So, I have planned the contents for next 100 days, and I will be posting one blog post each and everyday under the hashtag #100daysofSRE
. ✌️
I hope you tag along and share valuable feedback as I grow my knowledge and share my findings. 🙌
Basic Usage
Text substitution
sed
can be used to substitute text in a file or stream. For example, to replace all occurrences of “apple” with “orange” in a file, you can use the following command:
$ sed 's/apple/orange/g' file.txt
Printing lines
sed
can be used to print specific lines from a file or stream. For example, to print the first 10 lines of a file, you can use the following command:
$ sed -n '1,10p' file.txt
Deleting lines
sed
can be used to delete specific lines from a file or stream. For example, to delete all lines containing the word “apple” in a file, you can use the following command:
$ sed '/apple/d' file.txt
Search and replace
sed
can be used to search for a pattern and replace it with another string. For example, to replace all occurrences of “apple” with “orange” only on lines containing the word “fruit”, you can use the following command:
$ sed '/fruit/s/apple/orange/g' file.txt
Multiple commands
sed
can be used to execute multiple commands on a file or stream. For example, to print the first 10 lines of a file and delete all lines containing the word “apple”, you can use the following command:
$ sed -e '1,10p' -e '/apple/d' file.txt
Utilizing sed
to extract inf from log files
-
Extracting specific lines from a log file:
$ sed -n '5,10p' log_file
This command will print lines 5 through 10 from the
log_file
. -
Removing blank lines from a log file:
$ sed '/^$/d' log_file
This command will remove all blank lines from the
log_file
. -
Extracting lines that match a specific pattern:
$ sed -n '/pattern/p' log_file
This command will print all lines from
log_file
that contain the specified pattern. -
Extracting lines that don’t match a specific pattern:
$ sed -n '/pattern/!p' log_file
This command will print all lines from
log_file
that do not contain the specified pattern. -
Replacing a specific string in a log file:
$ sed 's/old_string/new_string/g' log_file
This command will replace all occurrences of
old_string
withnew_string
in thelog_file
. -
Extracting the last 10 lines of a log file:
$ sed -n '$-10,$p' log_file
This command will print the last 10 lines of the
log_file
. -
Extracting the first 10 lines of a log file:
$ sed -n '1,10p' log_file
This command will print the first 10 lines of the
log_file
. -
Extracting lines that match multiple patterns:
$ sed -n '/pattern1/p; /pattern2/p' log_file
This command will print all lines from
log_file
that contain eitherpattern1
orpattern2
. -
Counting the number of lines that match a pattern:
$ sed -n '/pattern/=' log_file | wc -l
This command will count the number of lines in
log_file
that containpattern
. -
Extracting lines between two patterns:
$ sed -n '/start_pattern/,/end_pattern/p' log_file
This command will print all lines from
log_file
that are betweenstart_pattern
andend_pattern
.
Useful Templates
-
Using
awk
andsed
to extract the most frequent HTTP response codes:cat access.log | awk '{print $9}' | sed 's/\.[0-9]*//g' | sort | uniq -c | sort -nr | head
This command first extracts the HTTP response code from the log file using
awk
, then removes any decimal places usingsed
. It then sorts the codes, counts the number of occurrences of each code, and sorts them in descending order to show the most frequent ones usingsort
,uniq
, andhead
. -
Using
grep
,sed
, andawk
to extract the top 10 IP addresses with the most requests:$ cat access.log | grep -v "127.0.0.1" | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 10 | sed 's/^ *//'
This command first removes requests from localhost using
grep
, then extracts the IP address from the log file usingawk
. It then sorts the IP addresses, counts the number of requests from each IP, and sorts them in descending order to show the top 10 usingsort
,uniq
, andhead
. Finally,sed
is used to remove any leading spaces. -
Using
sed
andcut
to extract the top 5 most frequently requested URLs:$ cat access.log | cut -d'"' -f2 | cut -d' ' -f2 | sort | uniq -c | sort -rn | head -n 5 | sed 's/^ *//'
This command first extracts the URLs from the log file using
cut
, then sorts them, counts the number of occurrences of each URL, and sorts them in descending order to show the top 5 usingsort
,uniq
, andhead
. Finally,sed
is used to remove any leading spaces. -
Using
sed
andgrep
to extract the number of requests by hour:$ cat access.log | grep -v 'spider' | sed -e 's/\[//g' -e 's/\]//g' | cut -d: -f2 | sort | uniq -c | awk '{print $2,$1}' | sort -n
This command first removes requests from spiders using
grep
, then removes the square brackets around the timestamp usingsed
, and extracts the hour from the timestamp usingcut
. It then sorts the hours, counts the number of requests in each hour, and sorts them in ascending order to show the number of requests by hour usingsort
,uniq
,awk
, andsort
. -
Using
sed
andawk
to extract the top 5 most frequent user agents:$ cat access.log | awk -F'"' '{print $6}' | sort | uniq -c | sort -rn | head -n 5 | sed 's/^ *//'
This command first extracts the user agents from the log file using
awk
, then sorts them, counts the number of occurrences of each user agent, and sorts them in descending order to show the top 5 usingsort
,uniq
, andhead
. Finally,sed
is used to remove any leading spaces. -
Using
sed
,awk
, andcut
to extract the number of requests by day:$ cat access.log | awk '{print $4}' | cut -d
- Extracting IP Addresses from Log Files
$ sed -nr 's/.*([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*/\1/p' access.log
This command extracts IP addresses from an Apache access log file using a regular expression pattern.
-
Replacing a Specific String in a File
$ sed -i 's/old_string/new_string/g' file.txt
This command replaces all occurrences of ‘old_string’ with ‘new_string’ in the ‘file.txt’ file.
-
Removing Blank Lines from a File
$ sed '/^$/d' file.txt
This command removes all blank lines from a file.
-
Extracting Specific Lines from a File
$ sed -n '5,10p' file.txt
This command prints lines 5-10 from the ‘file.txt’ file.
-
Reversing the Order of Lines in a File
$ sed '1!G;h;$!d' file.txt
This command reverses the order of lines in the ‘file.txt’ file.
-
Removing Lines That Match a Pattern
$ sed '/pattern/d' file.txt
This command removes all lines that match the specified pattern from the ‘file.txt’ file.
-
Searching for a Pattern in Multiple Files
$ grep 'pattern' *.txt | sed 's/:.*//g' | sort | uniq
This command searches for a pattern in all .txt files in the current directory, removes the filename from the output, sorts the output, and removes duplicate lines.
- Count the number of lines in a log file
$ sed -n '$=' logfile.txt
This command will print the number of lines in the log file
logfile.txt
. -
Get the date and time of the first and last log entry:
$ sed -n '1p;$p' logfile.txt
This command will print the first and last lines of the log file, which will contain the date and time of the corresponding log entry.
-
Get the IP address of the client that made the most requests
$ sed -n '/^\[.*\]/ {print $1} | sort | uniq -c | sort -nr | head -1' logfile.txt
This command will print the IP address of the client that made the most requests. The
sed
command first extracts all lines that start with[
, which are the lines that contain log entries. Thesort
command sorts the IP addresses in ascending order, and theuniq -c
command counts the number of times each IP address appears. Thesort -nr
command sorts the IP addresses in descending order by the number of requests, and thehead -1
command prints the first line, which is the IP address of the client that made the most requests. -
Get the top 10 most popular URLs
$ sed -n '/^\[.*\]/ {print $2} | sort | uniq -c | sort -nr | head -10' logfile.txt
This command is similar to the previous command, but it prints the top 10 most popular URLs. The only difference is that the
sed
command is now extracting the second column, which contains the URL. -
Get the list of all errors
$ sed -n '/ERROR/p' logfile.txt
This command will print all lines in the log file that contain the word “ERROR”.
-
Get the list of all warnings
$ sed -n '/WARNING/p' logfile.txt
This command is similar to the previous command, but it prints all lines in the log file that contain the word “WARNING”.
-
Get the list of all successful requests
$ sed -n '/200 OK/p' logfile.txt
This command will print all lines in the log file that contain the status code “200 OK”, which indicates a successful request.
Concluding Remarks
In this blog post, we’ve explored the basics of the sed
command and how it can extract meaningful information from log files. We’ve also looked at some templates that can simplify log analysis for system, application, and performance monitoring.
References
Thank you for reading my blog post! 🙏
If you enjoyed it and would like to stay updated on my latest content and plans for next week, be sure to subscribe to my newsletter on Substack. 👇
Once a week, I’ll be sharing the latest weekly updates on my published articles, along with other news, content and resources. Enter your email below to subscribe and join the conversation for Free! ✍️
I am also writing on Medium. You can follow me here.
Leave a comment