图片或图像相似度对比研究

https://www.quora.com/What-algorithms-can-detect-if-two-images-objects-are-similar-or-not

-----------------------------------------------

There are several ways that OpenCV supports the comparison of two images and how "similar" these images are. These methods include comparing histograms, template matching, and feature matching.

The method of comparing histograms cv2.compareHist() is probably the simplest and fastest method however it can sometimes be too simplistic and inaccurate. Another method is template matching cv2.matchTemplate() which compares a "template" search image with the target image. This method works well for identical images with similar size and orientation but can be ineffective with angled images. The feature matching method is probably one of the most efficient methods as extracted features can be used to determine the similarity between images. These features can be recognized against rotated or scaled targets since a high proportion of similarities imply the same object. This category can be further split into texture descriptors (HOG, LBP, Haar) and keypoint descriptors (SIFT/SURF).

For your application, I would look into template or feature matching since your images could vary by angle.

-----------------------------------------------------------

Don't try direct euclidean distance measure, it suffers from the curse of dimensionality for high dimensional vectors due to the fact that images contain too many irrelevant features. So you need to carefully craft the image matching system keeping that in mind.

I will give two approaches but I don't know which of them will work best for you.

  1. Approach 1: Use the classic computer vision (CV) techniques of feature extraction using keypoint detectors and descriptor extraction followed by model fitting, then doing probability analysis.
  2. Approach 2: Learn the dissimilarity/distance metric using siamese convolutional neural networks (convNet).

Let's go a little deeper, in each case you wish to implement a function:

y=f(x1,x2)y=f(x1,x2)

where y[0,1]y∈[0,1], xx’s are the images being compared.

y=1y=1 if images match and oo otherwise

Approach 1:

This consist of:

  1. Keypoint detection: Using harris or FAST corner detectors to find easily localizable points.
  2. Descriptor extraction: Extract a small patch around the keypoints, these patches need not be pixel values, they can be based on histogram of gradients or learnt. The idea is to preserve the best information while discarding the unnecessary information such as illumination. So these descriptors are L2L2normalized and indexed into a searchable data structure.
  3. Matching: The descriptors from x1x1 & x2x2 are matched using a brute force strategy with O(n2)O(n2) runtime based on the number of descriptors per image. But this can be sped up using hashing such as locality sensitive hashing (LSH) with about O(n)O(n) runtime or using a Kd-tree with best bin first (BBF) search strategy as used in scale invariant feature transform (SIFT) algorithm. This solves the correspondence problem, but you need one more final thing to do.
  4. Model fitting: Fitting a homography matrix to the set of corresponding points helps identify inliers and outliers. The homography is estimated using random sample consensus (RANSAC) algorithm. After which using some probabilitistic analysis on inliers and outliers the algorithm can then decide whether the two images match or not.

Yes the algorithms above are quite complex to understand but they do a good job of matching images together in applications like automatic panorama stitching. You can use OpenCV to help you implement approach 1:

The machine learning (ML) approach as outlined below is probably what you should try first as it is a bit easier to implement and understand.

Approach 2:

Using two siamese convNets, the idea is to learn a high-level distance metric between any given two images. This requires that you gather a lot of training examples though. So the convNet should learn a new mapping:

z=g(x)z=g(x)

Such that

d=||z1z2||d=||z1−z2||

Represents a high-level distance measure which you can threshold in order to decide whether the two images match or not.

y=h((ddth))y=h(−(d−dth))

Where hh = heaviside step function,dthdth = distance threshold.

During training dd is minimized across all similar training examples while it is maximized across all different training examples. This way the convNet learns an image embedding in which similar images have similar vectors.

However, applying an arbitrary threshold is a bad idea, so instead we can use a support vector machine (SVM) feeding directly from the zz vectors in order to learn the dividing hyperplane. In that case the output from the SVM can be directly fed through the heaviside step function in order to get the binary output.

Hope this helps.

原文地址:https://www.cnblogs.com/welhzh/p/12048809.html