day_6:验证码识别

一、普通图形验证码

1、相关库安装(MAC)

brew install imagemagick
brew install tesseract --with-all-languages
pip3 install tesserocr pillow

 导入tesserocr报错和解决办法

# 导入tesserocr报错
import tesserocr
!strcmp(locale, "C"):Error:Assert failed:in file baseapi.cpp, line 203

# 解决办法
import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr

 事例测试(方法一比方法二精确度好)

# 方法一
import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr
from PIL import Image

image = Image.open('/Users/huangjunyi/Desktop/code.jpg')
result = tesserocr.image_to_text(image)
print(result)

# 方法二
import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr

print(tesserocr.file_to_text('/Users/huangjunyi/Desktop/code.jpg'))

如果图像识破不出来就需要先转灰度再二值化处理

# 转灰度
image = image.convert('L')  
image.show()
import locale
locale.setlocale(locale.LC_ALL, 'C')
import tesserocr
from PIL import Image

threshold = 140  # 二值化的阀值
table = []
image = Image.open('/Users/huangjunyi/Desktop/code.jpg')
image = image.convert('L')  # 灰度化

for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

image = image.point(table, '1')

image.show()

result = tesserocr.image_to_text(image)
print(result)

处理前:         处理后:

二、极验滑动验证码(Selenium、ChromeDriver、Chrome)

三、点触验证码

四、微博宫格验证码

五、12306验证码

原文地址:https://www.cnblogs.com/jp-mao/p/10046809.html