Reading reference(cite from MIT open course)

Reading reference for 9.520-A 学习的网络：回归与分类，2001 春季

透视学习问题

Bertero, M., T. Poggio, and V. Torre. "Ill-posed Problems in Early Vision." Proc. of the IEEE 76 (1988): 869-889.

虽然局限于初期视觉，但该文包含了关于病态问题和正则化方法的易读的介绍性内容。

Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." Neural Computation 7 (1995): 219-269.

关于学习理论和正则化理论相互关系的一个详尽介绍。在本讲和接下来的讲座中，我们会经常提到这篇文章。

Vapnik, V. The Nature of Statistical Learning Theory. Springer, 1995.

第一章是关于统计学习理论的易读的、直接的介绍。

进一步参考读物：

Bertero, M. "Regularization Methods for Linear Inverse Problems." In Inverse Problems. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.

这是一篇很好的综述。

Tikhonov, A. N. and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.

关于正则化理论的第一本书。

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

如果你想深入了解统计学习理论的基础，要浏览该书的第一章。

正则化的解

Kolmogorov, N., and S.V. Fomine. Elements of the Theory of Functions and Functional Analysis. Dover, 1975.

经典文献. 为了跟上课程进度，应阅读第二章中的 5.1 节、6.4 节和6.5 节，第四章中的 13.1、13.2、13.3、13.5、13.6 和 15.1 节，特别要认真阅读涉及到函数空间的部分。

Strang, G. Calculus. Wellesley-Cambridge Press, 1991.

第13章含有关于 Lagrange 乘子技术非常好的说明。

进一步参考读物：

Bertero, M. "Regularization Methods for Linear Inverse Problems" In Inverse Problems. Edited by G. Talenti. Lecture Notes in Mathematics. Vol. 1225. 1986, pp. 52-112.

这是一篇很好的综述。

Tikhonov, A. N., and V. Y. Arsenin. Solutions of Ill-posed Problems. W. H. Winston, 1977.

关于正则化理论的第一本书。

再生核 Hilbert 空间

Kolmogorov, N. and S. V. Fomine. Elements of the Theory of Functions and Functional Analysis. Dover, 1975.

经典文献. 为了跟上课程进度，应阅读第二章中的 5.1 节、6.4 节和 6.5 节，第四章中的 13.1、13.2、13.3、13.5、13.6 和 15.1 节，特别要认真阅读涉及到函数空间的部分。

Strang, G. Introduction to Linear Algebra. Wellesley-Cambridge Press, 1993.

第六章含有本课程中用到的矩阵代数内容(及更多内容!)。

进一步参考读物：

Aronszajn, N. "Theory of Reproducing Kernels." Trans. Amer. Math. Soc. 686 (1950): 337-404.

RKHS（内容较难）。

Girosi, F. "An Equivalence Between Sparse Approximation and Support Vector Machines." Neural Computation 10 (1998): 1455-1480.

在附录A中你会发现关于 RKHS 的一个简单介绍。

Wahba, G. Spline Models for Observational Data. SIAM, 1990.

第一章包含 RKHS 的入门介绍。

传统逼近方法

Strang, G. Calculus. Wellesley-Cambridge Press, 1991.

第13章含有关于 Lagrange 乘子技术非常好的说明。

非参数化技术和正则化理论

Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." Neural Computation 7 (1995): 219-269.

该文的部分内容也会对这次讲座有所帮助。

进一步参考读物：

Vapnik, V. N. Estimation of Dependences Based on Empirical Data. Springer, 1982.

第九章包含了在正则化理论框架下关于 Parzen 窗的讨论。

岭逼近技术

Bishop, C. M. Neural Networks for Pattern Recognition. Clarendon, 1995.

第三章和第四章详细的讨论了单层和多层感知器。

Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." Neural Computation 7 (1995): 219-269.

包含了关于不同逼近技术之间相互联系的信息来源。

进一步参考读物：

Hertz, J., A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.

这是一本好书，该书主要从物理学家的观点来看神经网络。

正则化网络及相关内容

Girosi, F., M. Jones, and T. Poggio. "Regularization Theory and Neural Network Architectures." Neural Computation 7 (1995): 219-269.

这是最后一次提及这篇文章，一定要精读该文。

进一步参考读物：

Hertz, J., A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Addison Wesley, 1991.

这是一本好书，该书主要从物理学家的观点来看神经网络。

统计学习理论导论

Vapnik, V. The Nature of Statistical Learning Theory. Springer, 1995.

Chapter 1 is a readable first-hand introduction to the subject.

进一步参考读物：

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

第一章是关于统计学习理论的易读的、直接的介绍。

经验风险最小化原则的一致性

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

第三章包含了这次讲座的所有内容(甚至更多)。第二章的部分内容会提供给你在理论背后的一些深刻观点，但如果你要弄清楚描述结论和证明结论之间的差异，则最好浏览一下第14章及其他章节。

VC 维和 VC 界

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

第四章包含了这次讲座的所有内容(甚至更多)

回归和结构风险最小化的 VC 理论

Alon, N., et al. "Scale Sensitive Dimensions, Uniform Convergence, and Learnability." Symposium on Foundation of Computer Science (1993).

该文给出了对于实值函数与分布无关的一致收敛性的充分必要条件。

Evgeniou, T., M. Pontil, and T. Poggio. "Regularization Networks and Support Vector Machines." Advances in Computational Mathematics 13 (2000): 1-50.

该文中包含了这次讲座的大部分内容。

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

第五章和第六章包含了本讲中讨论的大部分结果，但并不是全部。

用于分类的支持向量机

Strang, G. Calculus. Wellesley-Cambridge Press, 1991.

第13章含有关于 Lagrange 乘子技术非常好的说明。

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

本讲将包含第十章中的部分内容，为了深入了解 SVMs 及其他技术，也需要阅读一下第八章。

用于回归的支持向量机

Evgeniou, T., M. Pontil, and T. Poggio. "Regularization Networks and Support Vector Machines." Advances in Computational Mathematics 13 (2000): 1-50.

该文中包含了正则化网络和支持向量机的 Bayesian 解释。

Girosi, F. "An Equivalence between Sparse Approximation and Support Vector Machines." Neural Computation 10 (1998): 1455-1480.

该文研究了 SVM 和 BPD 之间的关系。

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

本讲将包含第11章和第13章中的部分内容。

进一步参考读物：

Chen, S., D. Donoho, and M. Saunders. "Atomic Decomposition by Basis Pursuit." Tech Rep 479. Dept. of Statistics. Stanford University. 1995.

Daubechies, I. "Time Frequency Localization Operators: a Geometric Phase Space Approach." IEEE Trans. on Information Theory 34 (1988): 605-612.

Mallat, S., and S. Zhang. "Matching Pursuits with Time-Frequency Dictionaries." IEEE. Trans. on Signal Proc. 41 (1993): 3397-3415.

Pontil, M., S. Mukherjee, and F. Girosi. "On the Noise Model of Support Vector Machine Regression." CBCL Paper #168, AI Memo #1651, Massachusetts Institute of Technology, Cambridge, MA (1998).

当前研究热点I: 核方法

Cristianini, N., and J. Shawe-Taylor. Support Vector Machines and Other Kernel-based Learning Methods. Cambridge, 2000.

该书第三章详细介绍了核方法。

Vapnik, V. Statistical Learning Theory. Wiley, 1998.

在第10章，第11章和第12章中含有核方法及其思想的内容。

进一步参考读物：

Berg, C., J. P. R. Christensen, and P. Ressel. "Harmonic Analysis on Semigroups." Springer Verlag.

该书的题目令人恐惧，但第三章的内容易于阅读，并且包含了关于正定函数的清晰介绍。

Jaakkola, T., and D. Haussler. "Exploiting Generative Models in Discriminative Classifiers." NIPS (1998).

Niyogi, P., T. Poggio, and F. Girosi. "Incorporating Prior Information in Machine Learning by Creating Virtual Examples." IEEE Proceedings on Intelligent Signal Processing 86 (1998): 2196-2209.

神经科学II

Logothetis, N. K., J. Pauls, and T. Poggio. "Viewered-Centered Object Recognition in Monkeys ." AI Memo 1472, CBCL Paper 95 (1994).

Logothetis, N. K., T. Vetter, A. Hulbert, and T. Poggio. "View-Based Models of 3D Object Recognition and Class-Specific Invariances." AI Memo 1473, CBCL Paper 94 (1994).

Riesenhuber, M., and T. Poggio. "Hierarchical Models of Object Recognition in Cortex." Nature Neuroscience 2 (1999): 1019-1025.

当前研究热点II: 误差逼近和逼近论

Lorentz, G. G. "Approximation of Functions." Chelsea Pub co, 1986.

内容不很容易，但格式紧凑，叙述严谨(特别是第一章，第五章和第8-10章)。

Niyogi, P., and F. Girosi. "On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions." Neural Computation 8 (1996): 819-842.

包含了对各种类型误差的讨论。

当前研究热点III: 支持向量机的理论与实现

Bazaraa, Sherali, and Shetty. Nonlinear Programming, Theory and Techniques. John Wiley & Sons, 1993.
关于优化理论的教科书。

SVM 的解是唯一的吗?
Burges, and Crisp. Uniqueness of the SVM Solution NIPS 12 (1999).

SVMs 的分解方法:
Osuna, Edgar. Support Vector Machines: Training And Applications. Ph.D. Thesis (1998).

一次优化两个变量:
Platt, John C. "Sequential Minimal Optimization: A Fast Algorithm For Training Support Vector Machines." Microsoft Research MST-TR-98-14 (1998).

分解方法的分析:
Chang, Chih-Chung, Chih-Wei Hsu, and Chih-Jen Lin. "The Analysis of Decomposition Methods For Support Vector Machines." Proceedings of IJCAI99, SVM workshop (1999).

Keerthi, S. S., and E. G. Gilbert. Convergence of a Generalized SMO Algorithm For SVM Classifier Design Control Division. Dept. of Mechanical and Production Engineering, National University of Singapore CD-00-01 (2000).

Keerthi, S. S., S. K. Shevade, C. Bhattacharyya, and K. R. K. Murthy. Improvements to Platt's SMO Algorithm for SVM Classifier Design Control Division. Dept. of Mechanical and Production Engineering, National University of Singapore CD-99-14 (1999).

稀疏性控制:
Osuna, Freund, and Girosi. "Reducing Run-time Complexity in SVMs." Proceedings of the 14th Int'l Conference on Pattern Recognition.

当前研究热点V: Bagging和Boosting

Breiman, L. "Bagging Predictors." Machine Learning 26 (1996): 123-140.

Schapire, R. E., Y. Freund, P. Bartlett, and W. S. Lee. "Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods." The Annals of Statistics 26 (1998): 1651-1686.

选讲: 小波和框架

Stollnitz, E. J., T. D. DeRose, and D. H. Salesin. "Wavelets for Computer Graphics: A Primer Department of Computer Science and Engineering." University of Washington Tech Rep 94-09-11 (1994).
关于小波的一个易读的介绍.

Daubechies, I. "Ten Lectures on Wavelets." CBMS-NSF Regional Conferences Series in Applied Mathematics, SIAM, Philadelphia PA (1992).

虽然有些内容涉及到高级课题，但也包含了一些基本的理论结果。