从另一个角度看ORB-SLAM3——第1帧及单目初始化

源码：https://github.com/UZ-SLAMLab/ORB_SLAM3

处理完第0帧后，类似地，第1帧和0到1帧之间所有时刻的IMU测量值都会被系统读取，在TrackMonocular()函数中进行处理，从而完成系统的单目初始化。我们来看看这一过程都经过了哪些函数。

1. GrabImageMonocular()

前面的部分不再重复，直接进入函数GrabImageMonocular()。首先将第1帧转化成灰度图，并创建当前帧。这其中包括为每一帧提取ORB特征点和描述子，具体实现可以看代码，入口在Frame的构造函数里：

ExtractORB(0,imGray,0,1000);

然后调用Track()函数估计第1帧的位姿Tcw。

2. Track()

记住此时系统状态是NOT_INITIALIZED，因此会执行和读取第0帧时不同的条件语句主体。

if(mState!=NO_IMAGES_YET)
{
    if(mLastFrame.mTimeStamp>mCurrentFrame.mTimeStamp)
    {
        cerr << "ERROR: Frame with a timestamp older than previous frame detected!" << endl;
        unique_lock<mutex> lock(mMutexImuQueue);
        mlQueueImuData.clear();
        CreateMapInAtlas();
        return;
    }
    else if(mCurrentFrame.mTimeStamp>mLastFrame.mTimeStamp+1.0)
    {
        cout << "id last: " << mLastFrame.mnId << "    id curr: " << mCurrentFrame.mnId << endl;
        if(mpAtlas->isInertial())
        {

            if(mpAtlas->isImuInitialized())
            {
                cout << "Timestamp jump detected. State set to LOST. Reseting IMU integration..." << endl;
                if(!pCurrentMap->GetIniertialBA2())
                {
                    mpSystem->ResetActiveMap();
                }
                else
                {
                    CreateMapInAtlas();
                }
            }
            else
            {
                cout << "Timestamp jump detected, before IMU initialization. Reseting..." << endl;
                mpSystem->ResetActiveMap();
            }
        }

        return;
    }
}

首先检查时间戳是否有问题。这里分为2种情况：（1）当前帧的时间戳小于上一帧mLastFrame，显而易见这是不合理的，表示时间戳出错，调用CreateMapInAtlas()函数新建地图、重置所有状态并返回到上一层；（2）当前帧和上一帧之间时间差超过1秒，表示时间戳出现跳跃。此时若IMU已经初始化过了且完成了第2次Iniertial BA，就重置Active Map。若还没有做BA，或者IMU还没有初始化，就调用CreateMapInAtlas()新建地图。经过这一过程，系统状态将会变成NO_IMAGES_YET。

如果一切顺利，系统下一步会调用PreintegratedIMU()计算第0帧和第1帧之间的预积分。预积分的推导也是个很复杂的部分，有时间再整理，这里可以参考ORB-SLAM3的论文和Christian Forster的《IMU Preintegration on Manifold for Efficient Visual-Inertial Maximum-a-Posteriori Estimation》及补充材料。简要看下这个函数都计算了什么。

3. PreintegrateIMU()

while(true)
{
    bool bSleep = false;
    {
        unique_lock<mutex> lock(mMutexImuQueue);
        if(!mlQueueImuData.empty())
        {
            IMU::Point* m = &mlQueueImuData.front();
            cout.precision(17);
            if(m->t<mCurrentFrame.mpPrevFrame->mTimeStamp-0.001l)
            {
                mlQueueImuData.pop_front();
            }
            else if(m->t<mCurrentFrame.mTimeStamp-0.001l)
            {
                mvImuFromLastFrame.push_back(*m);
                mlQueueImuData.pop_front();
            }
            else
            {
                mvImuFromLastFrame.push_back(*m);
                break;
            }
        }
        else
        {
            break;
            bSleep = true;
        }
    }
    if(bSleep)
        usleep(500);
}

while()循环的主体就干了一件事，逐个检查mlQueueImuData中的IMU测量值的时间戳，将位于第0帧和第1帧之间的数据存储到mvImuFromLastFrame中。

由于采用中值积分的方式计算预积分，需要取不同时刻间的平均值作为新的加速度、角速度测量值，实现部分如下：

if((i==0) && (i<(n-1)))
{
    float tab = mvImuFromLastFrame[i+1].t-mvImuFromLastFrame[i].t;
    float tini = mvImuFromLastFrame[i].t-mCurrentFrame.mpPrevFrame->mTimeStamp;
    acc = (mvImuFromLastFrame[i].a+mvImuFromLastFrame[i+1].a-
            (mvImuFromLastFrame[i+1].a-mvImuFromLastFrame[i].a)*(tini/tab))*0.5f;
    angVel = (mvImuFromLastFrame[i].w+mvImuFromLastFrame[i+1].w-
            (mvImuFromLastFrame[i+1].w-mvImuFromLastFrame[i].w)*(tini/tab))*0.5f;
    tstep = mvImuFromLastFrame[i+1].t-mCurrentFrame.mpPrevFrame->mTimeStamp;
}
else if(i<(n-1))
{
    acc = (mvImuFromLastFrame[i].a+mvImuFromLastFrame[i+1].a)*0.5f;
    angVel = (mvImuFromLastFrame[i].w+mvImuFromLastFrame[i+1].w)*0.5f;
    tstep = mvImuFromLastFrame[i+1].t-mvImuFromLastFrame[i].t;
}
else if((i>0) && (i==(n-1)))
{
    float tab = mvImuFromLastFrame[i+1].t-mvImuFromLastFrame[i].t;
    float tend = mvImuFromLastFrame[i+1].t-mCurrentFrame.mTimeStamp;
    acc = (mvImuFromLastFrame[i].a+mvImuFromLastFrame[i+1].a-
            (mvImuFromLastFrame[i+1].a-mvImuFromLastFrame[i].a)*(tend/tab))*0.5f;
    angVel = (mvImuFromLastFrame[i].w+mvImuFromLastFrame[i+1].w-
            (mvImuFromLastFrame[i+1].w-mvImuFromLastFrame[i].w)*(tend/tab))*0.5f;
    tstep = mCurrentFrame.mTimeStamp-mvImuFromLastFrame[i].t;
}
else if((i==0) && (i==(n-1)))
{
    acc = mvImuFromLastFrame[i].a;
    angVel = mvImuFromLastFrame[i].w;
    tstep = mCurrentFrame.mTimeStamp-mCurrentFrame.mpPrevFrame->mTimeStamp;
}

要累积的预积分有两种：（1）上一帧到到当前帧pImuPreintegratedFromLastFrame和（2）上一关键帧到当前帧mpImuPreintegratedFromLastKF。对第0帧和第1帧来说，这两种预积分的结果相同，因为第0帧既是第1帧的上一帧，也是它的上一关键帧。调用Preintegrated类的IntegrateNewMeasurement方法计算预积分，预积分的递推公式和Forster的论文一致。

mpImuPreintegratedFromLastKF->IntegrateNewMeasurement(acc,angVel,tstep);
pImuPreintegratedFromLastFrame->IntegrateNewMeasurement(acc,angVel,tstep);

4. MonocularInitialization()

计算完预积分，再次进入函数MonocularInitialization()，单目初始化系统并估计当前帧的位姿。mpInitializer此时不再是空指针，else的主体被执行。

else
{
    if (((int)mCurrentFrame.mvKeys.size()<100||((mSensor == System::IMU_MONOCULAR)&&(mLastFrame.mTimeStamp-mInitialFrame.mTimeStamp>1.0)))
    {
        delete mpInitializer;
        mpInitializer = static_cast<Initializer*>(NULL);
        fill(mvIniMatches.begin(),mvIniMatches.end(),-1);

        return;
    }

    // Find correspondences
    ORBmatcher matcher(0.9,true);
    int nmatches = matcher.SearchForInitialization(mInitialFrame,mCurrentFrame,mvbPrevMatched,mvIniMatches,100);

    // Check if there are enough correspondences
    if(nmatches<100)
    {
        delete mpInitializer;
        mpInitializer = static_cast<Initializer*>(NULL);
        fill(mvIniMatches.begin(),mvIniMatches.end(),-1);
        return;
    }

    cv::Mat Rcw; // Current Camera Rotation
    cv::Mat tcw; // Current Camera Translation
    vector<bool> vbTriangulated; // Triangulated Correspondences (mvIniMatches)

    if(mpCamera->ReconstructWithTwoViews(mInitialFrame.mvKeysUn,mCurrentFrame.mvKeysUn,mvIniMatches,Rcw,tcw,mvIniP3D,vbTriangulated))
    {
        for(size_t i=0, iend=mvIniMatches.size(); i<iend;i++)
        {
            if(mvIniMatches[i]>=0 && !vbTriangulated[i])
            {
                mvIniMatches[i]=-1;
                nmatches--;
            }
        }

        // Set Frame Poses
        mInitialFrame.SetPose(cv::Mat::eye(4,4,CV_32F));
        cv::Mat Tcw = cv::Mat::eye(4,4,CV_32F);
        Rcw.copyTo(Tcw.rowRange(0,3).colRange(0,3));
        tcw.copyTo(Tcw.rowRange(0,3).col(3));
        mCurrentFrame.SetPose(Tcw);

        CreateInitialMapMonocular();

    }
}

如果当前帧特征点数少于100或者上一帧和初始帧时间差大于1秒，就重置初始化步骤。这意味着如果第1帧不满足上述条件，我们就要接着读取第2帧，并把第2帧设置为mInitialFrame，从头开始做初始化。当然，这里我们假设第1帧是满足条件的。

现在，系统需要找到第0帧和第1帧有多少特征点是匹配的，这个工作由SearchForInitialization()完成。类似的处理，如果匹配数小于100，就重置初始化步骤。

第1帧的位姿包括旋转Rcw和平移twc，这里用到的算法很常规，调用ReconstructWithTwoViews()做对极约束来求解基础矩阵或单应矩阵，再用三角测量估计特征点的深度。我们先看看这个函数是如何实现的。

5. ReconstructWithTwoViews()

5.1

ReconstructWithTwoViews()是个虚函数，会根据所使用的相机模型调用不同的实现，这里就以针孔模型Pinhole为例。函数有4个参数：

初始帧，即第0帧，经过去畸变的特征点mInitialFrame.mvKeysUn；
当前帧，即第1帧，经过去畸变的特征点mCurrentFrame.mvKeysUn；
第0帧到第1帧的匹配mvIniMatches，每个元素表示第0帧特征点在第1帧中的index，若匹配不成功则为-1;
第1帧位姿Rcw和tcw;
第0帧特征点的三维坐标mvIniP3D；
存储三角测量匹配结果的vbTriangulated，和mvIniMatches对应。

内部调用Reconstruct()执行初始化，传递上述的几个参数。

if(!tvr){
    cv::Mat K = this->toK();
    tvr = new TwoViewReconstruction(K);
}

return tvr->Reconstruct(vKeys1,vKeys2,vMatches12,R21,t21,vP3D,vbTriangulated);

5.2 Reconstruct()

mvMatches12存储第0帧到第1帧的匹配结果，若匹配成功，则对应mvbMatched1元素为true，否则为false。

for(size_t i=0, iend=vMatches12.size();i<iend; i++)
{
    if(vMatches12[i]>=0)
    {
        mvMatches12.push_back(make_pair(i,vMatches12[i]));
        mvbMatched1[i]=true;
    }
    else
        mvbMatched1[i]=false;
}

应用RANSAC算法估算单应矩阵和基础矩阵，
从mvMatches12中随机挑选8组匹配为一个set，一共mMaxIterations（默认200）个set。

for(int it=0; it<mMaxIterations; it++)
{
    vAvailableIndices = vAllIndices;
    // Select a minimum set
    for(size_t j=0; j<8; j++)
    {
        int randi = DUtils::Random::RandomInt(0,vAvailableIndices.size()-1);
        int idx = vAvailableIndices[randi];

        mvSets[it][j] = idx;

        vAvailableIndices[randi] = vAvailableIndices.back();
        vAvailableIndices.pop_back();
    }
}

开启两个线程同时计算单应矩阵H和基础矩阵F模型，并为两种模型打分，分数存储在SH和SF中。

thread threadH(&TwoViewReconstruction::FindHomography,this,ref(vbMatchesInliersH), ref(SH), ref(H));
thread threadF(&TwoViewReconstruction::FindFundamental,this,ref(vbMatchesInliersF), ref(SF), ref(F));

选择分数较好的那组作为求解模型，解出当前帧位姿，并应用三角测量估算特征点深度。

if(SH+SF == 0.f) return false;
float RH = SH/(SH+SF);
float minParallax = 1.0;
if(RH>0.50)
{
    cout << "Initialization from Homography" << endl;
    return ReconstructH(vbMatchesInliersH,H, mK,R21,t21,vP3D,vbTriangulated,minParallax,50);
}
else
{
    cout << "Initialization from Fundamental" << endl;
    return ReconstructF(vbMatchesInliersF,F,mK,R21,t21,vP3D,vbTriangulated,minParallax,50);
}

求解结束后，返回到MonocularInitialization()，设置第0帧位姿为单位矩阵，及初始化成功后的当前帧位姿Tcw。

mInitialFrame.SetPose(cv::Mat::eye(4,4,CV_32F));
cv::Mat Tcw = cv::Mat::eye(4,4,CV_32F);
Rcw.copyTo(Tcw.rowRange(0,3).colRange(0,3));
tcw.copyTo(Tcw.rowRange(0,3).col(3));
mCurrentFrame.SetPose(Tcw);

最后调用CreateInitialMapMonocular()新建初始化成功后的地图。

6. CreateInitialMapMonocular()

用第0帧和第1帧创建关键帧并插入到地图。

KeyFrame* pKFini = new KeyFrame(mInitialFrame,mpAtlas->GetCurrentMap(),mpKeyFrameDB);
KeyFrame* pKFcur = new KeyFrame(mCurrentFrame,mpAtlas->GetCurrentMap(),mpKeyFrameDB);

mpAtlas->AddKeyFrame(pKFini);
mpAtlas->AddKeyFrame(pKFcur);

遍历所有合格的三维点，创建地图点，并添加点与关键帧之间的可见信息。然后为每个地图点从能观察到它的关键帧中选出最佳描述子mDescriptor，并计算平均viewing direction，这两项工作由ComputeDistinctiveDescriptors()和UpdateNormalAndDepth()完成，解释起来又是啰里啰嗦，源码写得很有条理，也易读。最后添加地图点到地图中。

for(size_t i=0; i<mvIniMatches.size();i++)
{
    if(mvIniMatches[i]<0)
        continue;

    //Create MapPoint.
    cv::Mat worldPos(mvIniP3D[i]);
    MapPoint* pMP = new MapPoint(worldPos,pKFcur,mpAtlas->GetCurrentMap());

    pKFini->AddMapPoint(pMP,i);
    pKFcur->AddMapPoint(pMP,mvIniMatches[i]);

    pMP->AddObservation(pKFini,i);
    pMP->AddObservation(pKFcur,mvIniMatches[i]);

    pMP->ComputeDistinctiveDescriptors();
    pMP->UpdateNormalAndDepth();

    //Fill Current Frame structure
    mCurrentFrame.mvpMapPoints[mvIniMatches[i]] = pMP;
    mCurrentFrame.mvbOutlier[mvIniMatches[i]] = false;

    //Add to Map
    mpAtlas->AddMapPoint(pMP);
}

更新共视图covisibility graph的边和权重。

pKFini->UpdateConnections();
pKFcur->UpdateConnections();

Bundle ajustment图优化当前地图中所有关键帧的位姿和地图点的坐标，使得投影误差最小。

Optimizer::GlobalBundleAdjustemnt(mpAtlas->GetCurrentMap(),20);

得到优化后位姿和地图点坐标后，需要对整个场景的尺度归一化，常用的方法是令第0帧和第1帧之间的平移距离为单位1。这里采用令第0帧场景深度中位值为单位1的方法，来控制场景的规模。首先调用ComputeSceneMedianDepth()计算第0帧的深度中位值，确定尺度。

float medianDepth = pKFini->ComputeSceneMedianDepth(2);
float invMedianDepth;
if(mSensor == System::IMU_MONOCULAR)
    invMedianDepth = 4.0f/medianDepth; // 4.0f
else
    invMedianDepth = 1.0f/medianDepth;

用得到的尺度归一化关键帧位姿和地图点坐标。

// Scale initial baseline
cv::Mat Tc2w = pKFcur->GetPose();
Tc2w.col(3).rowRange(0,3) = Tc2w.col(3).rowRange(0,3)*invMedianDepth;
pKFcur->SetPose(Tc2w);

// Scale points
vector<MapPoint*> vpAllMapPoints = pKFini->GetMapPointMatches();
for(size_t iMP=0; iMP<vpAllMapPoints.size(); iMP++)
{
    if(vpAllMapPoints[iMP])
    {
        MapPoint* pMP = vpAllMapPoints[iMP];
        pMP->SetWorldPos(pMP->GetWorldPos()*invMedianDepth);
        pMP->UpdateNormalAndDepth();
    }
}

剩下一些常规设置，包括传递预积分指针、计算速度模型的初始速度、插入关键帧到建图线程并改变系统状态mState为OK表示初始化成功等。