Python网站日志分析

针对自己的网站日志分析做了个小插件：

import time

li = [['robots', 'robots.txt', 0], ['pd_1', '-Catalog/', 0], ['pd_2', '/catalog/', 0], ['qp1_1', '/hot-china-products/', 0], ['qp1_2', '/find-china-products/', 0], ['qp2', '-ns/', 0], ['qd_sp', '/manufacturers-directory/', 0], ['qp_sp', '/manufacturers-search/', 0], ['inquiry', '/sendInquiry/', 0], ['fr_pd', '/product-detail', 0], ['fr_comp', '/companyinfo/', 0], ['fr_pl', '/product-list', 0], ['fr_pind', '/products/index.html', 0], ['fr_ol', '/offer-list/', 0], ['fr_off', '/offer-detail', 0]]
all = 0
home = 0
f = open('gbwww.txt')
startTime = time.clock()
for ln in f.readlines():
    ln = ln.split()[2]
    all += 1
    for n in li:
        if n[1] in ln:
            n[2] += 1
            break
    if ln == '/':
        home += 1

endTime = time.clock()
f.close()

print 'all - ', all
print 'home - ', home
for n in li:
    print n[0], '-', n[2]

print 'all time is', (endTime - startTime)

只是雏形，尚未完善，仅供参考。

1月30日补充：

1、加入了时间，可测试花费了多长时间。

2、更新了列表规则，可统计更多内容。

问题：

还有2个内容没法统计，主要是用到了正则，打算等看到正则一段补充上。