Computer Vision 基础学习（5）

Knowing and Thinking

We can find the corresponding translation and rotation according to decomposation the essential matrix which is defined as ( E = hat TR ) In priciple the essential matrix is a ( 3 imes 3 ) matrix which has 9 degree of freedoms. But the essential matrix is calculated in homogenous coordinates which leads to 8 degree of freedoms. The ninth element is a scalar. To construct the essential matrix, we need at least 8 correspondences. Acorrding to esstial matrix equation. ( x_2^TEx_1 = 0 ), we can derive that

( x_2^Tegin{bmatrix} h_1 & h_2 & h_3 end{bmatrix} egin{bmatrix} x_1 \ y_1 \ z_1 end{bmatrix} = x_2^Tegin{bmatrix} h_1 x_1+h_2 y_1+h_3 z_1 end{bmatrix} \=x_1x_2^Th_1+y_1x_2^Th_2+z_1x_2^Th_3\= egin{bmatrix} x_1x_2^T & y_1x_2^T & z_1x_2^T end{bmatrix}egin{bmatrix} h1\ h2 \h3 end{bmatrix}=0 )

the equation can be simplifyed as ( Ahat E= 0 ) where ( hat E ) is the vectorized essential matrix and ( egin{bmatrix} x_1x_2^T & y_2x_2^T & z_2x_2^T end{bmatrix} ) forms a row of ( A ) and ( A ) is a matrix with ( n imes 9 ). The rank must be 8 in ideal. But it can be 9 because of the noise.

The solution ( hat E ) is the null space of A, but because the rank is not always 8, the equation can be solved at least mean square ( argmin Vert Ax Vert^2 ). The solution can be expressed as the singular vector ( V_n ) of ( A ).

As the previous lectures, the essential matrix has the features that the singular value is ( egin{bmatrix}sigma & 0 & 0 \ 0 & sigma & 0 \ 0 & 0 & 0 end{bmatrix} ), where the first two singular value are equal and the third singular value is zero. But the solution from the 8 point algorthm always does not have the form. In other words, it has 3 different singular values. and we shoud add a constraint of the singular decomposition.

( hat E = Uegin{bmatrix}sigma_1 & 0 & 0 \ 0 &sigma_2 & 0 \ 0 & 0 & sigma_3 end{bmatrix}V^TRightarrow U egin{bmatrix} sigma & 0 & 0 \ 0 & sigma & 0 \ 0 & 0 & 0 end{bmatrix}V^T )

where ( sigma = frac{sigma_1+sigma_2}{2} )

Comment

The 8 points algorithm is simple to solve. But it has also disadvantage. The A matrix is not robust, if the correspondences have a huge difference in coordinate components. A optimum solution is normolization. It can be divided into 2 steps. move the centorid of the correspondences in the origin and multiplication a scalar to each points so that the mean differences of each points is equal ( sqrt{2} ) as in figure 1.

Figure 1. After normalization, the points are moved to the center of the image.

The essential matrix is not the final result. The purpose is to find the relative rotation and translation of the camera. According to the essential matrix where ( E = hat TR ) we still can not direct find the rotaion and translation. Because the SVD of ( E ) is not clear, since U has more solution such as

( E = Uegin{bmatrix} 1 & 0 & 0 \ 0 & 1 & 0 \ 0 & 0 & -1end{bmatrix}egin{bmatrix}1 & 0 & 0 \ 0 & 1 & 0 \ 0 & 0 &-1 end{bmatrix}egin{bmatrix} sigma & 0 & 0 \ 0 & sigma & 0 \ 0& 0 & 0end{bmatrix} V^T = Uegin{bmatrix} sigma & 0 & 0 \ 0 & sigma & 0 \ 0& 0 & 0end{bmatrix} V^T )

( R_Z(pmfrac{pi}{2}) = egin{bmatrix} 0 & mp 1 & 0 \ pm1 & 0 & 0 \ 0 & 0 & 1end{bmatrix} )

which define a rotation around Z axis.

( R=U R_{Z}^{ op}left(pm frac{pi}{2} ight) V^{ op} )

( hat{T}=U R_{Z}left(pm frac{pi}{2} ight) Sigma U^{ op} )

It leads total 4 different relative position of the camera as in figure 2.

Figure 2. 4 different postion of the cameras.

(a) leads a rotation of ( frac{pi}{2} ) and translation ( frac{pi}{2} ) rotation (b) leads a ( -frac{pi}{2} ) translation and a ( frac{pi}{2} ) rotation and (c) leads a ( frac{pi}{2} ) translation and a ( -frac{pi}{2} ) rotation and (d) leads a ( -frac{pi}{2} ) translation and rotation.

But without previous knowledge, we cannot form the original postion in 3D. Because the Z coordinate is eliminated, when we project the point into 2D space. and the sclar infomation cannot be obtained. We need some knowledge about the camera such as the translation information to measure the scalar.