Python问题汇总

1.dict is not callable

tree是一个字典类型。

tree("left") -> tree["left"]

2.list indices must be integers or slices, not tuple

dataset是原生的python数组，是list类型（python原生数组叫list类型）

errorMerge = sum(power(dataset[:, -1] - treeMean, 2))

尝试使用numpy里面的array索引方式来索引原生数组，将会爆此错误。

3.'NoneType' object is not iterable

代码返回值为None, value；直接处理返回值第一个值将会爆此错误；

4.shapes (1,1) and (4,1) not aligned: 1 (dim 1) != 4 (dim 0)

# 基于给出的dataset，（新）生成K个样本，用于做质点
def randCentoids(dataset, k):
    n = shape(dataset)[1]
    centoids = mat(zeros((k, n)))

    for j in range(n):
        minJ = min(dataset[:, j])
        maxJ = max(dataset[:, j])
        rangJ = maxJ - minJ
        centoids[:, j] = mat(minJ + rangJ * random.rand(k, 1))
        
    return centoids

这个错误的意思是作为矩阵相乘，行列数无法直接相乘，因为min（）和max返回的都是numpy.matrix类型；为什么会返回矩阵类型？因为dataset就是matrix类型，所以返回的虽然是单值，但是也会被认为是矩阵类型。

rangJ = float(maxJ - minJ)

强转为float之后，问题解决。

5.could not broadcast input array from shape (2) into shape (1,1)

sampleCenterRecord = mat(zeros((m, 1)))
...
dist = distCaculate(centroids[j, :], dataset[i, :])

sampleCenterRecord的维数定义有问题，改为(m, 2)问题解决。

6.IndexError: index 0 is out of bounds for axis 0 with size 0

os.chdir("D:\galaxy\aliyunsvn\code\MLInAction\dataset")
dataArr = loadDataSet("ex00.txt")
dataMat = mat(dataArr)
value = [[0.996757]]
feature = 0
dataMat[nonzero(dataMat[:, feature] > value)[0], :][0]
 这个是因为dataMat中满足这个条件的日志的数量为0，所以最后索引[0]回报数组越界异常。

 7.unhashable type: 'numpy.ndarray'

1 for splitVal in set(dataSet[:,featIndex].A):
2     ...
之前是异常是unhashable type: 'matrix'，后来添加A想要尝试转化为Array看看依然报错。
这异常的意思是set里面只支持python原生的数据类型，对于numpy的对象无法识别（处理）。所以unhashable，本质就是参数类型不匹配。

  7.only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

这个异常说明了索引类型有问题：
overLap = nonzero(logical_and(dataMat[:, item].A>0, dataMat[:, j].A>0))[0]
因为item是从参数过来，但是外部调用的时候这个参数误传为一个function，故报错。
8.data type must provide an itemsize  xTx = xMat.T * xMat  这个执行的时候爆的错，原因就是在loadDataset的时候没有进行发咯at转化，直接处理，导致字符串之间矩阵运算导致异常。需要转化为float，问题解决

def loadDataset(fileName):
   X = []
   y = []
   for line in open(fileName):
       values = line.split()
       lineArr = []
       lineArr.append(float(values[0]))
       lineArr.append(float(values[1]))
       X.append(lineArr)
       y.append(float(values[-1]))
   return X, y
 9. unhashable type: 'matrix'

for splitValue in set(dataset[:, featureIndex]):
   ... ...
　　这是因为在python里面set其实是对于其里面的元素取Hash值然后根据hashz值进行排序；但是如果是对于numpy.ndarry/ Matrix等被封装的类型则无法获取其hash值，set里面的元素只能是原生类型。作如下处理问题解决：

for splitValue in set(dataset[:, featureIndex]).A.flatten().tolist():
   ... ...
 10. ValueError: Unknown label type: 'continuous'
发生这个异常是因为我使用了RandomForestClassification，但是y值却使用了float，所以报错；作为分类器的y值必须是int，否则怎么分类啊。