KNN算法的超参数

一:定义

  超参数是在开始学习过程之前设置值的参数,而不是通过训练得到的参数数据。

二:常用超参数

  k近邻算法的k,权重weight,明可夫斯基距离公式的p,这三个参数都在KNeighborsClassifier类的构造函数中。

三:共同代码

import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets

digits = datasets.load_digits()

x = digits.data
y = digits.target

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

四:k的最优数值

best_score = 0.0
best_k = -1
for k in range(1,11):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(x_train,y_train)
    t = knn.score(x_test,y_test)
    if t>best_score:
        best_score = t
        best_k = k

print(best_k)
print(best_score)

 

五:weight的最优数值

  如果取值为uniform,例如:当我们取k等于3,结果预测到三个点距离最近的点为三个,sklearn就会选择一个进行返回预测结果,但是我们如果考虑距离也就是取值为distance,就会有一个权重的概念,一般为距离的倒数,例如该点到另外三个点的距离为1,3,4则权重为1,1/3,1/4,则返回1这个点作为预测结果。

best_score = 0.0
best_k = -1
best_method = ''
for method in ['uniform','distance']:
    for k in range(1,11):
        knn = KNeighborsClassifier(n_neighbors=k,weights=method)
        knn.fit(x_train,y_train)
        t = knn.score(x_test,y_test)
        if t>best_score:
            best_score = t
            best_k = k
            best_method = method

print(best_score)
print(best_k)
print(best_method)

六:p的最优数值

  当需要p的参数时,weight必须为distance,不能为uniform

best_score = 0.0
best_k = -1
best_p = 1
for i in range(1,6):
    for k in range(1,11):
        knn = KNeighborsClassifier(n_neighbors=k,weights='distance',p=i)
        knn.fit(x_train,y_train)
        t = knn.score(x_test,y_test)
        if t>best_score:
            best_k = k
            best_score = t
            best_p = i
print(best_p)
print(best_score)
print(best_k)

   

原文地址:https://www.cnblogs.com/lyr999736/p/10664951.html