sklearn中的k_means和KMeans区别_yalan_Wei_kmeans和kmeans++的区别

网络投稿 02-07 7963

sklearn中的k_means和KMeans区别

1.KMeans的缺点 2.sklearn.KMeans参数 3.sklearn.KMeans属性

KMeans

**： 1.k点中心个数的确定，很难确定到底分多少个聚类才是最合适的 2.k点中心的确定，需要人为的事先给定，而且k点中心的确定比较难把握，不同的聚类中心会导致不同的聚类结果

sklearn.KMeans参数：

KMeans( n_clusters=8, *, init='k-means++', n_init=10, max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', ) 注释： n_clusters：int型，生成的聚类数，默认为8init：有三个可选值：‘k-means++’、‘random’、或者传递一个ndarray向量。１）‘k-means++’ 用一种特殊的方法选定初始质心从而能加速迭代过程的收敛２）‘random’ 随机从训练数据中选取初始质心。３）如果传递的是一个ndarray，则应该形如 (n_clusters, n_features) 并给出初始质心。默认值为‘k-means++’。n_init：int型，用不同的聚类中心初始化值运行算法的次数，最终解是在inertia意义下选出的最优结果。默认值为10max_iter：int型，执行一次k-means算法所进行的最大迭代数。默认值为300tol：float型，默认值= 1e-4　与inertia结合来确定收敛条件。n_jobs：int型。指定计算所用的进程数。内部原理是同时进行n_init指定次数的计算。（１）若值为 -1，则用所有的CPU进行运算。若值为1，则不进行并行运算，这样的话方便调试。（２）若值小于-1，则用到的CPU数为(n_cpus + 1 + n_jobs)。因此如果 n_jobs值为-2，则用到的CPU数为总CPU数减1。random_state：整形或 numpy.RandomState 类型，可选用于初始化质心的生成器（generator）。如果值为一个整数，则确定一个seed。此参数默认值为numpy的随机数生成器。copy_x : bool, 默认值=True，如果copy_x=True,则原始数据被保留algorithm : {“auto”, “full”, “elkan”}, default=“auto”，K-means algorithm to use. *属性： cluster_centers_ : 聚类中心labels_ :inertia_ : 样本到其最近聚类中心的平方距离之和，按样品权重（如果提供）加权。n_iter_ : int，总共迭代计算的次数n_features_in_ : int，‘fit’ 中看到的特征数。feature_names_in_ : K_means() k_means( X, n_clusters, *, sample_weight=None, init='k-means++', n_init=10, max_iter=300, verbose=False, tol=0.0001, random_state=None, copy_x=True, algorithm='auto', return_n_iter=False, )

返回值： 1.centroid : ndarray of shape (n_clusters, n_features)，Centroids found at the last iteration of k-means. 2.label : ndarray of shape (n_samples,) The label[i] is the code or index of the centroid the i’th observation is closest to. 3.inertia : float The final value of the inertia criterion (sum of squared distances to the closest centroid for all observations in the training set). 4.best_n_iter : int Number of iterations corresponding to the best results. Returned only if return_n_iter is set to True.