Python文件练习_查找log中的IP并统计

需求:监控日志,如果有攻击,就把ip加入黑名单

分析:

1、打开日志文件

2、把ip地址拿出来

3、判断每一个ip出现的次数,如果大于50次的话,加入黑名单

4、每分钟读一次

log样式:

178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /wp-includes/logo_img.php HTTP/1.0" 302 161 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
178.210.90.90 - - [04/Jun/2017:03:44:13 +0800] "GET /blog HTTP/1.0" 301 233 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
178.210.90.90 - - [04/Jun/2017:03:44:15 +0800] "GET /blog/ HTTP/1.0" 200 38278 "http://nnzhp.cn/wp-includes/logo_img.php" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4" "10.3.152.221"
66.249.75.29 - - [04/Jun/2017:03:45:55 +0800] "GET /bbs/forum.php?mod=forumdisplay&fid=574&filter=hot HTTP/1.1" 200 17482 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
37.9.169.20 - - [04/Jun/2017:03:47:59 +0800] "GET /wp-admin/security.php HTTP/1.1" 302 161 "http://nnzhp.cn/wp-admin/s

实现:

import time
point = 0 #每次记录文件指针的位置
while True:#持续读取实时更新的log
    all_IP = []
    f=open('access.log',encoding='utf-8')
    #不能用read来直接读文件,文件从磁盘打开载入内存,进入cpu分析,若文件过大,内存会被占满,电脑回卡死
    f.seek(point)  # 移动文件指针,已统计过的IP不再额外统计
    for line in f:#直接循环一个文件对象的话,每次循环的是文件的每一行
        IP = line.split('-')[0].strip()#取出IP
        all_IP.append(IP)#将IP放入列表
    point = f.tell()  # 记录了指针的位置
    all_IP_set = set(all_IP)#集合天生去重
    for i in all_IP_set:#循环集合比循环列表效率高,已去重
        if all_IP.count(i) > 50:
            print('加入黑名单的IP是%s,一分钟内出现了%s次'%(i,all_IP.count(i)))
    f.close()
    time.sleep(60)#每分钟读一次
原文地址:https://www.cnblogs.com/dongrui624/p/8804264.html