ScSPM

引入

Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity T9SUBJ66_OZENPNWW0_thumb in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images

非线性SVM的计算代价巨大

In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes.

6DRPYM5IOENXWZVDQ4_thumb1

In recent years the bag-of-features (BoF) model has been extremely popular in image categorization. The method treats an image as a collection of unordered appearance descriptors extracted from local patches, quantizes them into discrete “visual words”, and then computes a compact histogram representation for semantic image classification

The method partitions an image into  IDNB5RUX_7GVIV13RC_thumb  segments in different scales L = 0; 1; 2, computes the BoF histogram within each of the 21 segments, and finally concatenates all the histograms to form a vector representation of the image. In case where only the scale L = 0 is used, SPM reduces to BoF.

用sparsecoding 替代VQ

Furthermore, unlike the original SPM that performs spatial pooling by computing histograms, our approach, called ScSPM, uses max spatial pooling that is more robust to local spatial translations and more biological plausible

用max pooling 来替代 spatial pooling

经过稀疏编码之后,用一个线性分类器就能取得很好的效果

Despite of such a popularity, SPM has to run together with nonlinear kernels, such
as the intersection kernel and the Chi-square kernel, in order to achieve a good performance, which requires intensive computation and a large storage.

交叉核,卡方核

Linear SPM Using SIFT Sparse Codes

VQ

HDCO_G893ORZWMI20C7BY9_thumb1

2I5H16CUTYJ7L2CP2NU_thumb2

在训练阶段主要是学习出基向量V,在测试阶段学习出基向量系数U

EME_8K_ZCOFCWJP_thumb1

NMVGFUVI21XCXU7EX_thumb2

稀疏编码,给损失函数上加上了稀疏性的约束

同VQ一样,训练阶段学基(过完备),测试阶段得到稀疏

优点:重构误差少;捕获的图像特征突出;据说图像块就是稀疏信号

622TWW6EWPUZ5A_thumb1

注意:local sparse coding

所以,用听投票的VQ会造成很大的量化误差,即使使用非线性的SVM效果也不明显,而且计算代价大

KWB72R4YUJG0O_KFZQ4_thumb1

In this work, we defined the pooling function F as a max pooling function on the absolute sparse codes

据说这个 max pooling 有生物学依据 ~~ 而且更加鲁棒

Similar to the construction of histograms in SPM, we do max pooling Eq. on a spatial pyramid constructed for an image.

K0F5ZXM2_KUA_7U0_thumb1

成功原因分析:

This success is largely due to three factors: (1) SC has much less quantization errors than VQ; (2) It is well known that image patches are sparse in nature, and thus sparse coding is particularly suitable for image data; (3) The computed statistics by max pooling are more salient and robust to local translations.

实现

1,Sparse Coding

求解SC的损失函数方程。当U固定或V固定时是凸的,但两者若都不固定则非凸。所以传统的解决办法是固定一个求解另一个,最新提出的 feature-sign search algorithm 计算速度更快

确定基V在线下,可以达到实时的确定特征的表达系数

2,Multi-class Linear SVM

EGV26N27O5TNW9J_thumb1

IJZMA7V3KVU4OXK77CF_thumb1

LBFGS优化函数(作者只需要提供损失函数和导数)

Experiment Revisit

patch size

In our experiments, we didn’t observe any substantial improvements by pooling over multiple patch scales, probably because max pooling over sparse codes can capture the salient properties of local regions that are irrelevant to the scale of local patches.

作者实验中只用了一种scale的patch —— 16*16 像素

实验证明空间层次的不同scale对性能提升不明显 —— 分析是由于max pooling 捕捉到了尺度不变的突出特征

codebook size

Intuitively, if the codebook size is too small, the histogram feature looses discriminant power; if the codebook size is too large, the histograms from the same class of images will never match.

随着codebook 数目的增多,scspm的性能不断提升

Sparse Coding Parameter

稀疏约束 0.3~0.4  产生的均值在10左右

Comparison of Pooling Methods

max pooling 性能突出 —— 更加鲁棒

A recent work shows that sparse coding can be dramatically accelerated by using a feed-forward network

总结

1,改变了量化策略 —— 用sparse coding 替代 kmeans

2,由于量化策略的改变 ,才用线性分类器就能取得更好的性能

3,sparse coding 要和 max pooling 结合使用

原文地址:https://www.cnblogs.com/sprint1989/p/4004767.html