制作自己的tesseract-docker环境镜像(实战)

  做OCR图文识别,在linux系统上发布时,需要安装tesseract环境。网上信息比较杂,基于各种linux系统做的Dockerfile,其表现也是五花八门,搞不清白。以下是我经过一两天的摸索的成果,可以有效的部署环境,希望对大家有用。过程大致分为三个阶段:1、制作基础镜像包,安装tesseract环境;2、上传tessdata语言包到服务器上,供tesseract识别时对照;3、制作应用程序的镜像,挂载tessdata语言包目录到/usr/local/share/tessdata,同时设置docker容器的环境变量TESSDATA_PREFIX;

一、准备基础镜像的Dockerfile文件。需要相关资源文件 tesseract-4.1.1.tar.gz,leptonica-1.80.0.tar.gz

https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1

http://www.leptonica.org/source/leptonica-1.80.0.tar.gz

FROM mamohr/centos-java
LABEL ANTHOR="siman(214382122@qq.com)" VERSION="1.0.0" BUILD_DATE="2020-09-01" 
      RESOURCES="https://github.com/tesseract-ocr/tesserac http://www.leptonica.org/index.html https://github.com/tesseract-ocr/tessdata" 
      DESCRIPTION="This image integrated and edited the running environment of tesseract-4.1.1 and leptonica-1.80.0, 
      and made it based on CentOS system. Based on this basic image, you can run your own tess4j jar application"

# 环境变量(tesseract)
ENV LD_LIBRARY_PATH="/usr/local/lib" 
    LIBLEPT_HEADERSDIR="/usr/local/include" 
    PKG_CONFIG_PATH="/usr/local/lib/pkgconfig"
# 安装tesseract环境
ADD   tesseract-4.1.1.tar.gz /
ADD   leptonica-1.80.0.tar.gz /

RUN   yum -y install file automake libicu-devel libpango1.0-dev libcairo-dev libjpeg-devel libpng-devel libtiff-devel zlib-devel libtool gcc-c++ make 
      && cd /leptonica-1.80.0 && ./configure && make && make install 
      && cd /tesseract-4.1.1 && ./autogen.sh && ./configure && make && make install 
      && rm -rf /leptonica-1.80.0 /tesseract-4.1.1
# 时区设置
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo 'Asia/Shanghai' >/etc/timezone

二、创建基础镜像包

docker build -t tess/centos-java:v1.0 . 

三、安装tessdata包

 链接: https://pan.baidu.com/s/1XAvPkTdUXuFq-q2InDREhQ 提取码: 6vjp  

四、制作自己的springboot-ocr服务镜像包,设置环境变量TESSDATA_PREFIX

FROM tess/centos-java:v1.0
LABEL ANTHOR="siman(214382122@qq.com)" VERSION="1.0.0" BUILD_DATE="2020-09-01"
VOLUME /tmp
ADD simm-framework-test-1.0.jar app.jar
EXPOSE 8080
ENV  TESSDATA_PREFIX="/usr/local/share/tessdata"
# 启动入口
ENTRYPOINT ["java","-jar","/app.jar"]

 五、启动容器,并挂载tessdata目录

docker run -it -v /usr/tessdata:/usr/local/share/tessdata -p 8080:8080 --name="ocr-api" ocr-api:v1.0
原文地址:https://www.cnblogs.com/MrSi/p/13601294.html