机器学习实战(代码讲解)

机器学习实战 http://www.cnblogs.com/qwertWZ/p/4582096.html

机器学习实战笔记：http://blog.csdn.net/Lu597203933/article/details/37969799

#第一个kNN分类器  inX-测试数据 dataSet-样本数据  labels-标签 k-邻近的k个样本  
def classify0(inX,dataSet, labels, k):  
    #计算距离  
    dataSetSize = dataSet.shape[0]  
    diffMat = tile(inX, (dataSetSize,1))- dataSet  
    sqDiffMat = diffMat ** 2  
    sqDistances = sqDiffMat.sum(axis = 1)  
    distances = sqDistances **0.5  
    sortedDistIndicies = distances.argsort()  
    classCount = {}  
    #选择距离最小的k个点  
    for i in range(k):  
        voteIlabel = labels[sortedDistIndicies[i]]  
        classCount[voteIlabel] = classCount.get(voteIlabel,0)+1  
    #排序  
    sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(1),reverse = True)  
    return sortedClassCount[0][0]

代码讲解：(a)tile函数 tile(inX, i);扩展长度 tile(inX, (i,j)) ;i是扩展个数，j是扩展长度。如：

>>> from numpy import *
>>> inX= array([[0,0],[1,2]])
>>> tile(inX,2)
array([[0, 0, 0, 0],
       [1, 2, 1, 2]])
>>> tile(inX,(4,2))
array([[0, 0, 0, 0],
       [1, 2, 1, 2],
       [0, 0, 0, 0],
       [1, 2, 1, 2],
       [0, 0, 0, 0],
       [1, 2, 1, 2],
       [0, 0, 0, 0],
       [1, 2, 1, 2]])
>>> tile(inX,3)
array([[0, 0, 0, 0, 0, 0],
       [1, 2, 1, 2, 1, 2]])
>>> tile(inX,1)
array([[0, 0],
       [1, 2]])