Mask R-CNN复现笔记

关于docker和主机之间文件的转换，参考docker的那个博客+ https://zhuanlan.zhihu.com/p/55516749

直接用docker安装，对facebookresearch/maskrcnn-benchmark的docker文件进行修改，注意几点CUDA改为10，apex留意一下dockerfile里面的pip uninstall apex; git clone https://github.com/NVIDIA/apex.git; cd apex; python setup.py install --cuda_ext --cpp_ext

安装的时候参考一下： https://ihaoming.top/archives/623a7632.html gcc版本<5.4

模仿archdyn的dockerfile修改。。

The only way to train and prevent the Runtime Error is to modify the Dockerfile and build it like:

ARG CUDA="9.0"
ARG CUDNN="7"

FROM nvidia/cuda:${CUDA}-cudnn${CUDNN}-devel-ubuntu16.04

RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# install basics
RUN apt-get update -y 
 && apt-get install -y apt-utils git curl ca-certificates bzip2 cmake tree htop bmon iotop g++

# Install Miniconda
RUN curl -so /miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh 
 && chmod +x /miniconda.sh 
 && /miniconda.sh -b -p /miniconda 
 && rm /miniconda.sh

ENV PATH=/miniconda/bin:$PATH

# Create a Python 3.6 environment
RUN /miniconda/bin/conda install -y conda-build 
 && /miniconda/bin/conda create -y --name py36 python=3.6.7 
 && /miniconda/bin/conda clean -ya

ENV CONDA_DEFAULT_ENV=py36
ENV CONDA_PREFIX=/miniconda/envs/$CONDA_DEFAULT_ENV
ENV PATH=$CONDA_PREFIX/bin:$PATH
ENV CONDA_AUTO_UPDATE_CONDA=false

RUN conda install -y ipython
RUN pip install ninja yacs cython matplotlib

# Install PyTorch 1.0 Nightly
RUN conda install -y pytorch-nightly -c pytorch && conda clean -ya

# Install TorchVision master
RUN git clone https://github.com/pytorch/vision.git 
 && cd vision 
 && python setup.py install

# install pycocotools
RUN git clone https://github.com/cocodataset/cocoapi.git 
 && cd cocoapi/PythonAPI 
 && python setup.py build_ext install

# install PyTorch Detection
RUN git clone https://github.com/facebookresearch/maskrcnn-benchmark.git 

WORKDIR /maskrcnn-benchmark

nvidia-docker build -t maskrcnn-benchmark docker/

Then after the build I have to go inside the docker container:

nvidia-docker run --rm -it maskrcnn-benchmark bash

And inside the docker container I build maskrcnn-benchmark without problems:

python setup.py build develop

I then have to commit this modified docker container so that I have a Docker Image that can always be started:

docker commit [Container ID] maskrcnn-benchmark:working

After all these steps I can train without problems with:

nvidia-docker run --shm-size=8gb -v /home/archdyn/Datasets/coco:/maskrcnn-benchmark/datasets/coco maskrcnn-benchmark:working python /maskrcnn-benchmark/tools/train_net.py --config-file "/maskrcnn-benchmark/configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 1 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1

具体复现的过程改参数参考：

https://zhuanlan.zhihu.com/p/57603975

https://zhuanlan.zhihu.com/p/67121644