tesseract-ocr和tesseract.exe is not installed or it's not in your path问题解决

一、解决方案:

1、http://www.ddooo.com/softdown/94968.htm   打开下载的压缩包,找到“tesseract-ocr-setup-3.02.02.exe”,双击运行;

2、python报错的地方,有pytesseract.py的连接,点开,修改pytesseract.py。如图:

注意:要在路径前加一个r。

二、此文字识别引擎,里面有一些训练好的数据库,也可自己fit-tunning。

使用和训练:

https://www.cnblogs.com/Leo_wl/p/5556620.html

http://www.cnblogs.com/cnlian/p/5765871.html

三、准确率一直提不上去,自己训练标注不现实,时间不允许。使用腾讯云

腾讯ocr免费1000次每天,可以使用,准确率自然高!

密钥地址:https://console.cloud.tencent.com/cam/overview

# coding=UTF-8
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# import docx
import requests
import hmac
import hashlib
import base64
import time
import random
import re

appid = "1257122374"#写入自己的腾讯云号码
bucket = "你的bucket"  #不要也可以
secret_id = "XXXXXXXXXXXXXXXXXX"  # 写入自己的账号里面的地址
secret_key = "EXXXXXXXXXXXXXXX"  # 同上
expired = time.time() + 2592000
onceExpired = 0
current = time.time()
rdm = ''.join(random.choice("0123456789") for i in range(10))
userid = "0"
fileid = "tencentyunSignTest"

info = "a=" + appid + "&b=" + bucket + "&k=" + secret_id + "&e=" + str(expired) + "&t=" + str(current) + "&r=" + str(
    rdm) + "&u=0&f="#去掉bucket

signindex = hmac.new(secret_key, info, hashlib.sha1).digest()  # HMAC-SHA1加密
sign = base64.b64encode(signindex + info)  # base64转码

url = "http://recognition.image.myqcloud.com/ocr/general"
headers = {'Host': 'recognition.image.myqcloud.com',
           "Authorization": sign,
           }
files = {'appid': (None, appid),
         'bucket': (None, bucket),
         'image': ('15.jpg', open('G:\360Downloads\15.jpg', 'rb'), 'image/jpeg')
        
         }

r = requests.post(url, files=files, headers=headers)
responseinfo = r.content
# 创建内存中的word文档对象
# file=docx.Document()
#r_index = r'itemstring":"(.*?)"'  # 做一个正则匹配
r_index = r'itemstring":"(w+)"' #我的只匹配数字和字母 result
= re.findall(r_index, responseinfo) for i in result: # file.add_paragraph(i) print i # file.save("D:\writeResult.docx")

 

原文地址:https://www.cnblogs.com/huangfuyuan/p/9316486.html