web访问日志分析

日志记录

在Web日志中,每条日志通常代表着用户的一次访问行为,例如下面就是nginx日志

14.23.95.98 - - [17/Mar/2015:22:26:54 -0400] "GET /pmd/phpmyadmin.css.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&js_frame=left&nocache=2705868602 HTTP/1.1" 200 3970 "http://104.131.67.100/pmd/navigation.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
14.23.95.98 - - [17/Mar/2015:22:26:55 -0400] "GET /pmd/js/mootools.js HTTP/1.1" 304 0 "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"

14.23.95.98 - - [17/Mar/2015:22:26:55 -0400] "GET /pmd/phpmyadmin.css.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&js_frame=right&nocache=2705868602 HTTP/1.1" 200 21799 "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"

14.23.95.98 - - [17/Mar/2015:22:26:55 -0400] "GET /pmd/js/tooltip.js HTTP/1.1" 304 0 "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"

这些日志信息,大致可以拆解为以下8个变量

  • remote_addr

    记录客户端的ip地址, 14.23.95.98

  • remote_user

    记录客户端用户名称

  • time_local

    记录访问时间与时区, [17/Mar/2015:22:26:55 -0400]

  • request

    记录请求的url与http协议, "GET /pmd/js/tooltip.js HTTP/1.1"

  • status

    记录请求状态,成功是200

  • body_bytes_sent

    记录发送给客户端文件主体内容大小, 21799

  • http_referer

    用来记录从那个页面链接访问过来的, "http://104.131.67.100/pmd/db_structure.php?token=1013c8e1ea31d0f0340af8de3cf4a0cb&db=bl"

  • http_user_agent

    记录客户浏览器的相关信息, “"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"

日志分析

有了这些记录的日志信心,我们就可以用来做一些分析了
例如,从nginx日志中得到访问量最高前10个IP

[root@biby nginx]# cat access.log | awk '{a[$1]++} END {for(b in a) print b"	"a[b]}' | sort -k2 -r | head -n 10

14.157.210.181  56

112.64.235.245  3

14.23.95.98     121

211.97.10.56    102

原文地址:https://www.cnblogs.com/biby/p/15217697.html