#100daysofSRE (Day 22): Essential /var/log
Files for SREs and How to Analyze Them
Introduction
As an SRE, logs are often your first line of defense when diagnosing issues. The /var/log/
directory contains numerous log files that capture system and application activities. Understanding which log files to check and how to extract relevant information efficiently can save hours of troubleshooting time.
In this post, we’ll cover essential log files and provide useful grep
, sed
, and awk
commands to analyze them effectively.
/var/log/syslog
(or /var/log/messages
)
This is the most comprehensive system log, capturing general system events, startup logs, kernel messages, and more.
- Find all logs related to a specific service (e.g.,
nginx
)grep 'nginx' /var/log/syslog
-
Find logs generated in the last hour
awk '$0 ~ strftime("%b %d %H", systime()-3600)' /var/log/syslog
/var/log/auth.log
(or /var/log/secure
in RHEL-based distros)
This log tracks authentication-related events, including SSH login attempts and sudo usage.
-
Find all failed SSH login attempts
grep 'Failed password' /var/log/auth.log
-
List all unique users who have logged in
awk '/session opened/ {print $NF}' /var/log/auth.log | sort | uniq
/var/log/kern.log
This log captures kernel-related events, which can be useful when diagnosing hardware or kernel-related issues.
-
Find kernel-related errors
grep -i 'error' /var/log/kern.log
-
Extract all timestamps of kernel panics
grep 'Kernel panic' /var/log/kern.log | awk '{print $1, $2, $3}'
/var/log/dmesg
The dmesg
log provides system boot logs and hardware-related messages.
-
Check for disk-related issues
dmesg | grep -i 'disk'
-
Find out when the system last rebooted
dmesg | grep -i 'systemd' | head -n 5
/var/log/httpd/access.log
(or /var/log/nginx/access.log
)
This log captures all HTTP requests for Apache or Nginx web servers.
-
Find the most requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head -10
-
Find all requests from a specific IP
grep '192.168.1.100' /var/log/nginx/access.log
/var/log/httpd/error.log
(or /var/log/nginx/error.log
)
This log records web server errors and can be helpful for debugging.
-
Find all 500 Internal Server Errors
grep '500' /var/log/nginx/error.log
-
Extract timestamps of the last 10 errors
grep -i 'error' /var/log/nginx/error.log | tail -n 10 | awk '{print $1, $2}'
/var/log/maillog
(or /var/log/mail.log
)
This log captures email-related activities, useful when troubleshooting mail servers.
-
Find all emails sent to a specific recipient
grep 'to=<user@example.com>' /var/log/maillog
-
Check for email delivery failures
grep -i 'failed' /var/log/maillog
Conclusion
The /var/log/
directory is a good source of information that can help SREs quickly diagnose and resolve issues.
If there are 3rd party tools or agents that are running on the host, there are separate log files for these tools as well.
By using tools like grep
, sed
, and awk
, we can efficiently parse log files and extract valuable insights.
Leave a comment