awk scripts for web server reporting and troubleshooting

Here are some scripts that well help us in our daily analysis of our servers

A few things we care about and the business will generally ask for are how the sites/servers are performing, in the absence of tools like splunk or awstats or similar tools these scripts can come in handy

This awk script is expecting apache format and is using the 9th field to capture the status codes for all hits

awk '{count[$9]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total status codes"}' /logfile |sort -n

This awk script uses the 7th field and provides top requested urls

awk '{count[$7]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total urls"}' /logfile |sort -n

This awk script uses the 11th field and provides top referering urls

awk '{count[$11]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total urls"}' /logfile |sort -n

This awk script uses a new field separator to get the whole useragent and provides a count of top agents

awk -F \" '{count[$6]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total agents"}' /logfile |sort -n

This awk script uses the 4th field to provide hits by sec

awk '{count[$4]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total hits"}' /logfile |sort -n

This awk script uses a new field separator to get the hour field and provide a count of hits by hour

awk -F \: '{count[$2]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total hits"}' /logfile |sort -n

Now reporting is nice but being able to troubleshoot is better

Let’s say you ran the status codes script and you notice more 500 status codes than you’d like and you’d like to see when they’re occuring
This script combines two awk scripts, this first part looks for any 50(0 1 3 & so on) error and that output is piped to our count by hour script and provides a breakdown of when most of our errors are occuring

awk '($9 ~ /50./)' /logfile | awk -F \: '{count[$2]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total 500s"}'|sort -n

Let’s say now you’ve found a high concentration of errors at 3am and now let’s see what urls are causing the errors
This awk script uses field 9 to check for any 500 type error AND field 4 for the 3AM hour, then counts and produces top erroring urls during that time

awk '($9 ~ /50./) && ($4 ~ /\[05\/Dec\/2013\:03/){count[$7]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total errors"}' /logfile |sort -n

What if I wanted to see what urls didn’t fail but were successful, redirected or missing? Well we use the not equal comparison for awk, using the previous script makes it easier to note the difference

awk '!($9 ~ /50./) && ($4 ~ /\[05\/Dec\/2013\:03/){count[$7]++}END{for (j in count) {print count[j], j; total+=count[j]} print total" Total errors"}' /logfile |sort -n

These scripts should provide you with a good basis for being able to get the information the business people need and insight for you on how your sites & systems are performing. All scripts are based on the apache log format and use the default “space” delimeter unless otherwise noted.

Happy scripting

 

awk – Hits by hour for apache and IIS log format

I’ve done this many different ways but this has to be the most efficient way I’ve found to get hits by hour and I get to use the power of awk

The script sets the delimiter of : and counts the number of occurrences of the 2nd field which is the hour field, I add a little word formatting for the date file I’m working with and sort the output

Apache log format

awk -F":" '{count[$2]++}END{for (j in count) {print "11-25-13 hour "j, count[j]" hits"; total+=count[j]} print "Total hits="total}' /log/file |sort

and in an 1/8 of the time it used to take me mixing several commands I now have this output

11-25-13 hour 00 34278 hits
11-25-13 hour 01 28582 hits
11-25-13 hour 02 27139 hits
11-25-13 hour 03 219542 hits
11-25-13 hour 04 33612 hits
11-25-13 hour 05 29900 hits
11-25-13 hour 06 36313 hits
11-25-13 hour 07 48721 hits
11-25-13 hour 08 60941 hits
11-25-13 hour 09 77082 hits
11-25-13 hour 10 99376 hits
11-25-13 hour 11 142141 hits
11-25-13 hour 12 191163 hits
11-25-13 hour 13 218150 hits
11-25-13 hour 14 238086 hits
11-25-13 hour 15 236122 hits
11-25-13 hour 16 268599 hits
11-25-13 hour 17 224519 hits
11-25-13 hour 18 220107 hits
11-25-13 hour 19 223004 hits
11-25-13 hour 20 182992 hits
11-25-13 hour 21 180524 hits
11-25-13 hour 22 182396 hits
11-25-13 hour 23 173174 hits
Total hits=3376463

IIS was slightly different due to the different log format, again I set the delimiter to : this time I count the 1st field add different word formatting and sort based on the 2nd column
IIS Log format

sudo awk -F":" '{count[$1]++}END{for (j in count) {print j" hour", count[j]" hits"; total+=count[j]} print "Total hits="total}' /log/file |sort -k 2

Produces this output

2013-11-25 00 hour 59259 hits
2013-11-25 01 hour 55567 hits
2013-11-25 02 hour 51763 hits
2013-11-25 03 hour 44487 hits
2013-11-25 04 hour 43262 hits
2013-11-25 05 hour 46869 hits
2013-11-25 06 hour 33495 hits
2013-11-25 07 hour 24929 hits
2013-11-25 08 hour 20310 hits
2013-11-25 09 hour 18445 hits
2013-11-25 10 hour 17211 hits
2013-11-25 11 hour 22776 hits
2013-11-25 12 hour 26052 hits
2013-11-25 13 hour 36508 hits
2013-11-25 14 hour 38920 hits
2013-11-25 15 hour 43963 hits
2013-11-25 16 hour 45905 hits
2013-11-25 17 hour 41755 hits
2013-11-25 18 hour 42276 hits
2013-11-25 19 hour 41531 hits
2013-11-25 20 hour 41101 hits
2013-11-25 21 hour 44790 hits
2013-11-25 22 hour 45541 hits
2013-11-24 23 hour 18 hits
2013-11-25 23 hour 49687 hits
Total hits=936444

Good luck and happy reporting