Distinctive Image Features from Scale-Invariant Keypoints（个人翻译+笔记）-介绍

Distinctive Image Features from Scale-Invariant Keypoints，这篇论文是图像识别领域SIFT算法最为经典的一篇论文，导师给布置的第一篇任务就是它。网上找了好多找不到中译本，那就自己动手丰衣足食吧，顺便造福后人，花时间翻译啃下来并做一个笔记在这吧。

--------------------------------------------------------------------------------------------------------

Distinctive Image Features from Scale-Invariant Keypoints

独特的尺度无关的图像特征关键点

abstract
摘要

This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a a substantial(充实的，有实力的) range of affine（仿射，几何学） distortion（扭曲，变形）, change in 3D viewpoint, addition of noise, and change in illumination.The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
这篇文章展示了一种从图片中提取有特色的不变特征方法，它可以用来执行对一个物体或者风景不同视角之间的匹配。这些特征对于图像的伸缩以及旋转是不变的，而且展示出能对几何扭曲变形、变换三维视角，增加噪声，光照改变进行健壮的匹配。这些特征是独特的，在一幅场景中，一个单个的特征能够被正确的在很多图片的大量数据库中进行高可能性的匹配。这篇文章也提供一种方法来利用这些特征用于物体识别，这一识别通过在已知物体是什么的数据库中利用快速邻域法（fast nearest-neighbor algorithm）匹配独立的特征。紧接着用Hough变换以鉴别对于一个单个物体的类属，最终对一致姿势的属性通过最小方差法（least-squares solution）执行认证。这种方法识别能够很好的在聚类与闭塞（occlusion ）之间识别物体的同时接近实时的表现

【笔记】SIFT这种方法，能够有效的对物体在不同的视角不同的光照有噪声的情况下对图像进行匹配，这种匹配是一幅图在一堆图片中的匹配。同时，该文章提供一种方法，通过快速邻域法匹配特征，用霍夫变换对这些类聚类，再通过最小方差法进行图像的匹配。

1.Introduction
1.介绍

Image matching is a fundamental aspect of many problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images, stereo correspondence, and motion tracking. This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene. The features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. Large numbers of features can be extracted from typical images with efficient algorithms. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition.
图像匹配在于计算机视觉中是很多问题的根本问题。包括物体识别、场景识别、从多幅图像中计算3D结构、立体对应和动作跟踪。这篇文章描述图像特征有很多属性使得他们更加合适于从不同的图像匹配物体或风景。这些特征对于图片的缩放以及旋转是不变的。对于光照以及3D照相机所得到的部分不变。它们能够很好的在频率域以及空间域定位，排出了可能的光照，聚类或者噪声的干扰。大量的特征能够通过使用合适的算法从典型的图片中提取出来。除此之外，这些特征是高度有特色的。能够允许用一个单个的特征在大量特征的数据库中以很高的正确概论匹配一幅图像。提供一个物体识别以及场景识别的基础。

The cost of extracting these features is minimized by taking a cascade filtering approach,in which the more expensive operations are applied only at locations that pass an initial test.Following are the major stages of computation used to generate the set of image features:
采用瀑布滤波器（cascade filtering卷积滤波器？）可以使提取特征的开销最小化，其中开销最大运算只在定位跟初始化测试时。接下来生成图像特征的主要的几个阶段：

1. Scale-space extrema detection: The first stage of computation searches over all scales and image locations. It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.
1.尺度空间极值检测：第一步运算查找所有尺度和图片位置，使用差分高斯运算识别潜在的尺度、方向不变的兴趣点能够使得运行更快。

2. Keypoint localization: At each candidate location, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stability.
2.关键点定位：对于每一个候选点，一个详细的模型要适应确定的位置与尺度，基于测量稳定性来确定关键点。

3. Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.
3.方向分配，一个或者多个方向为每个关键点指定，基于局部图像梯度指示，所有接下来在图片数据上对于每个特征的操作的都转变到相对的指定的方向、尺度、和位置。从而为这些变换提供了不变性。

4. Keypoint descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.
4.关键点描述：在选定的尺度上对每一个关键点周围的区域测量局部图像的梯度。他们都被转换到了转换到了一个代表允许特征局部的形状变形和光线的改变。

This approach has been named the Scale Invariant Feature Transform (SIFT), as it transforms image data into scale-invariant coordinates relative to local eatures.
这个方法命名为尺度不变的特征变换（SIFT），因为他转换图片进入了一个尺度不变的坐标对英语局部特征。

An important aspect of this approach is that it generates large numbers of features that densely cover the image over the full range of scales and locations. A typical image of size 500x500 pixels will give rise to about 2000 stable features (although this number depends on both image content and choices for various parameters). The quantity of features is particularly important for object recognition, where the ability to detect small objects in cluttered backgrounds requires that at least 3 features be correctly matched from each object for reliable identification.
对于这个方法一个重要的方面在于这个方法能够生成大量特征稠密的覆盖全尺度和位置。一个典型的500x500像素的图片将产生大约2000稳定的特征（尽管这个数字决定于图像的内容以及所选择的属性）。这些特征的量对于物体识别特别重要，在从杂乱的背景中检测小物体时，要得到可信的鉴别则至少3个特征与被正确的匹配、

For image matching and recognition, SIFT features are first extracted from a set of reference images and stored in a database. A new image is matched by individually comparing each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors. This paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databases.
对于图像匹配和识别，SIFT特征是第一个从一组相关图像提取出来并存储到数据库中的。一个新的图片与之前数据库中的特征单个的比较每个特征被匹配，基于计算特征向量之间的欧拉距离找出候选的匹配特征。这篇文章会讨论快速邻域可以使得面对大的数据库时计算快速。

The keypoint descriptors are highly distinctive, which allows a single feature to find its correct match with good probability in a large database of features. However, in a cluttered 2 image, many features from the background will not have any correct match in the database, giving rise to many false matches in addition to the correct ones. The correct matches can be filtered from the full set of matches by identifying subsets of keypoints that agree on the object and its location, scale, and orientation in the new image. The probability that several features will agree on these parameters by chance is much lower than the probability that any individual feature match will be in error. The determination of these consistent clusters can be performed rapidly by using an efficient hash table implementation of the generalized Hough transform.
关键点的描述是高度有特色的允许它在大数据库中用一个特征以较高的可能在数据库中找到正确的匹配。然而，在两幅聚类了的图片中，许多来自背景的特征不能够与数据库很好的对应上，会在正确的匹配上增加许多错误的匹配。正确的匹配，用鉴定关键点子集对应物体与它的位置，尺度，和方向的方法，能从所有匹配集中滤除出来。这样一些特征点与属性偶然的匹配错误要比单个点的匹配错误低很多。确定这些始终如一的聚类在使用有效的霍夫变换实现的哈希表实现能够快速的表现出来。

Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed verification. First, a least-squared estimate is made for an affine approximation to the object pose. Any other image features consistent with this pose are identified,and outliers are discarded. Finally, a detailed computation is made of the probability that a particular set of features indicates the presence of an object, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence .
对每个聚类的3或多个特征对应的一个物体，它的姿势受制于更深入的详细的验证，首先，最小二乘的估计是用做仿射近似一个物体的姿势，恒定不变的其他图片的特征当姿势鉴别出来，异常的被丢弃，最终，一个详细的计算是由特征的特定的集合组成的，代表了存在一个物体。给定匹配的准确度以及可能错误的匹配。拖过这些测试，物体匹配能够有足够的自信能够成功鉴别。