K均值算法

自主编写K-means算法 ,以鸢尾花花瓣长度数据做聚类,并用散点图显示。

代码:

from sklearn.datasets import load_iris
import numpy as np
iris = load_iris()
data = iris['data']
data.shape
data[:,0]
n = len(data)
m = data.shape[1]
k = 3
dist = np.zeros([n, k+1])
center = data[:k, :]
centernew = np.zeros([k, m])

while True:
for i in range(n):
for j in range(k):
dist[i, j] = np.sqrt(sum((data[i, :] - center[j, :])**2))
dist[i, k] = np.argmin(dist[i, :k])


for i in range(k):
index = dist[:, k] == i
centernew[i, :] = data[index, :].mean(axis=0)


if np.all((center == centernew)):
break
else:
center = centernew
print('样本归类:', dist[:, k])


运行结果:

用sklearn.cluster.KMeans,鸢尾花花瓣长度数据做聚类,并用散点图显示.

代码:

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris

iris=load_iris()
data=iris.data[:,1]
x=data.reshape(-1,1)

y=KMeans(n_clusters=3)
y.fit(x)

y_pre=y.predict(x)

plt.scatter(x[:,0],x[:,0],c=y_pre,s=50,cmap='rainbow')
plt.show()

运行结果:

鸢尾花完整数据做聚类并用散点图显示.

代码:

from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
iris = load_iris()
data = iris['data']

model = KMeans(n_clusters=3).fit(data)
pre = model.predict(data)
center = model.cluster_centers_

plt.scatter(data[:,0], data[:,1], c=pre, s=50, cmap='rainbow', marker='p', alpha=0.5)
plt.show()

 运行结果:

 .想想k均值算法中以用来做什么?

答:

k均值算法是聚类算法,最适合用于分类了,像分类图片、文本等,能通过他们的特征,然后把相似的归类到一块,就有类别区分了。

原文地址:https://www.cnblogs.com/CMean/p/12715751.html