python 识别图片验证码

1. 下载第三方模块

# 发送浏览器请求    
pip install requests    
# 文字识别    
pip install pytesseract    
# 图片处理    
pip install Pillow

2. （1）Pillow 中的 Image

# 注意：print_function的导入必须在Image之前，否则会报错    
from __future__ import print_function    
from PIL import Image    
"""    
pillow 模块 中 Image 的基本使用    
"""    
 
    
# 1.打开图片    
im = Image.open("../wordsDistinguish/test1.jpg")    
print(im)    
 
    
# 2.查看图片文件内容    
print("图片文件格式："+im.format)print("图片大小："+str(im.size))    
print("图片模式："+im.mode)    
 
    
# 3.显示当前图片对象    
im.show()    
 
    
# 4.修改图片大小，格式，保存    
size = (50, 50)    
im.thumbnail(size)    
im.save("1.jpg", "PNG")    
 
    
# 5.图片模式转化并保存，L 表示灰度 RGB 表示彩色    
im = im.convert("L")    
im.save("test1.jpg")

View Code

（2）基于 Tesseract-OCR 的 pytesseract

Python-tesseract是python的光学字符识别（OCR）工具。也就是说，它将识别并“读取”嵌入图像中的文本。
Python-tesseract是Google的Tesseract-OCR引擎的包装器。它作为独立的调用脚本也很有用，因为它可以读取Pillow和Leptonica成像库支持的所有图像类型，包括jpeg，png，gif，bmp，tiff等。此外，如果用作脚本，Python-tesseract将打印已识别的文本，而不是将其写入文件。

Windows下安装的话直接下载包即可，然后把其加入系统环境变量（即加入Path里）

# 从 Pillow 中导入图片处理模块 Image    
from PIL import Image    
# 导入基于 Tesseract 的文字识别模块 pytesseract    
import pytesseract    
"""    
@pytesseract：https://github.com/madmaze/pytesseract    
"""    
 
    
# 打开图片    
im = Image.open("../wordsDistinguish/Resources/1.jpg")    
# 识别图片内容    
text = pytesseract.image_to_string(im)    
print(text)

View Code

3. 安装 tesseract

https://github.com/UB-Mannheim/tesseract/wiki

4. 报出异常

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

解决

5. 接着重新加载代码

附属链接https://blog.csdn.net/weixin_42232219/article/details/100048086