👥 👦🏻 🦂 通过类似于梯度的下降来选择k近邻（井或其他超参数）的特征重要性 👩🏿‍🚀 ❗️ 😚

通过对最简单的机器学习任务进行实验，我发现在相当宽的范围内同时选择18个超参数会很有趣。就我而言，一切是如此简单，以至于可以用蛮横的计算机功能来完成任务。

学习一些东西时，发明某种自行车会非常有趣。有时事实证明确实提出了一些新建议。有时候，事实证明一切都是在我之前发明的。但是，即使我只是重复走在我之前的那条路，作为奖励，我也常常了解它们的功能和内部局限性的算法的基本机制。我邀请你到。

坦率地说，在Python和DS中，我是一个初学者，我根据我的旧编程习惯做了很多可以在一个团队中实现的事情，而Python则通过降低速度（不是有时而是按数量级）来惩罚。因此，我将所有代码上传到存储库。如果您知道如何更有效地实施它-请勿害羞，在此处进行编辑或在评论中写下。https://github.com/kraidiky/GDforHyperparameters

我相信，那些已经很酷的数据学家并且已经尝试了这一辈子的一切，对于学习过程的可视化很有趣，这不仅适用于此任务。

问题的提法

ODS.ai提供了如此出色的DS课程，还有第三堂课《分类，决策树和最近邻居的方法》。在那里，它在极其简单且可能是合成数据上显示出，最简单的决策树如何实现94.5％的准确度，而k个最近邻居的相同极其简单的方法无需任何预处理即可提供89％的准确性。

导入和加载数据

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('data/telecom_churn.csv')
df['Voice mail plan'] = pd.factorize(df['Voice mail plan'])[0]
df['International plan'] = pd.factorize(df['International plan'])[0]
df['Churn'] = df['Churn'].astype('int32')
states = df['State']
y = df['Churn']
df.drop(['State','Churn'], axis = 1, inplace=True)
df.head()

比较木材与knn

%%time
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.metrics import accuracy_score

X_train, X_holdout, y_train, y_holdout = train_test_split(df.values, y, test_size=0.3,
random_state=17)

tree = DecisionTreeClassifier(random_state=17, max_depth=5)
knn = KNeighborsClassifier(n_neighbors=10)

tree_params = {'max_depth': range(1,11), 'max_features': range(4,19)}
tree_grid = GridSearchCV(tree, tree_params, cv=10, n_jobs=-1, verbose=False)
tree_grid.fit(X_train, y_train)
tree_grid.best_params_, tree_grid.best_score_, accuracy_score(y_holdout, tree_grid.predict(X_holdout))

（{'max_depth'：6，6，'max_features'：16}，0.944706386626661，0.945）

对于knn一样

%%time
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

knn_pipe = Pipeline([('scaler', StandardScaler()), ('knn', KNeighborsClassifier(n_jobs=-1))])
knn_params = {'knn__n_neighbors': range(1, 10)}
knn_grid = GridSearchCV(knn_pipe, knn_params, cv=10, n_jobs=-1, verbose=False)

knn_grid.fit(X_train, y_train)
knn_grid.best_params_, knn_grid.best_score_, accuracy_score(y_holdout, knn_grid.predict(X_holdout))

（{'knn__n_neighbors'：9}，0.8868409772824689，0.891）
在这一点上，我为显然不诚实的knn感到遗憾，因为我们没有使用该指标。我没有想到，我从树中取出了feature_importances_并将其输入标准化。因此，特征越重要，点之间的距离就越大。

我们将数据归一化为功能的重要性

%%time
feature_importances = pd.DataFrame({'features': df.columns, 'importance':tree_grid.best_estimator_.feature_importances_})
print(feature_importances.sort_values(by=['importance'], inplace=False, ascending=False))

scaler = StandardScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_train_transformed = X_train_transformed * np.array(feature_importances['importance'])

X_holdout_transformed = scaler.transform(X_holdout)
X_holdout_transformed = X_holdout_transformed * np.array(feature_importances['importance'])

knn_grid = GridSearchCV(KNeighborsClassifier(n_jobs=-1), {'n_neighbors': range(1, 11, 2)}, cv=5, n_jobs=-1, verbose=False)
knn_grid.fit(X_train_transformed, y_train)
print (knn_grid.best_params_, knn_grid.best_score_, accuracy_score(y_holdout, knn_grid.predict(X_holdout_transformed)))

5	总分钟	0.270386
17	客户服务电话	0.147185
8	前夜总分钟	0.135475
2	国际计划	0.097249
十六	总国际费用	0.091671
十五	国际通话总数	09.090008
4	vmail消息数量	0.050646
10	平安夜总费用	0.038593
7	全天收费	0.026422
3	语音邮件计划	0.017068
十一	晚上总时间	0.014185
十三	每晚总费用	0.005742
12	夜间通话总数	0.005502
9	前夜总通话次数	0.003614
6	日间通话总数	0.002246
14	总国际分钟	0.002009
0	Account length	0.001998
1	Area code	0.000000

{'n_neighbors'：5} 0.909129875696528 0.913

这棵树与knn共享了一点知识，现在我们看到了91％，与香草树的94.5％相差不远。然后我想到一个主意。但是，实际上，我们如何需要对输入进行标准化以使knn显示最佳结果？

首先，我们将在脑海中估计现在多少将被视为“前额”。我们为每个参数设定18个参数，例如，对数刻度中的因子可能要进行10个步骤。我们得到10e18选项。一个具有所有可能的奇数个邻居的选项小于10，而交叉验证也为10，我认为约为1.5秒。事实证明，这是420亿年。也许不得不放弃晚上计算的想法。 :)在这里附近的某个地方，我想：“嘿！所以我会做一辆会飞的自行车！”

梯度搜索

实际上，此任务很可能只有一个可用的最大值。好吧，这当然不是一个取得好成绩的整个领域，但是它们几乎是相似的。因此，我们可以沿着梯度走，找到最合适的点。首先想到的是推广遗传算法，但是在这里自适应地形似乎并不太紧密，这会有点过大。

我将尝试手动开始。要将因素推为超参数，我需要处理定标器。在上一个示例中，如本课程中一样，我使用了StandartScaler，它使训练样本平均居中，并使sigma =1。为了在管道中很好地缩放它，必须使超参数更加复杂。我开始在sklearn.preprocessing中的转换器中搜索适合我的情况的东西，但没有找到任何东西。因此，我试图通过在其上悬挂其他捆绑因素来继承StandartScaler。

用于标称化的类，然后按比例乘以与sklearn流水线稍微兼容

from sklearn.base import TransformerMixin
class StandardAndPoorScaler(StandardScaler, TransformerMixin):
    #normalization = None
    def __init__(self, copy=True, with_mean=True, with_std=True, normalization = None):
        #print("new StandardAndPoorScaler(normalization=", normalization.shape if normalization is not None else normalization, ") // ", type(self))
        self.normalization = normalization
        super().__init__(copy, with_mean, with_std)
    def fit(self, X, y=None):
        #print(type(self),".fit(",X.shape, ",", y.shape if y is not None else "<null>",")")
        super().fit(X, y)
        return self
    def partial_fit(self, X, y=None):
        #print(type(self),".partial_fit(",X.shape, ",", y.shape if y is not None else "<null>)")
        super().partial_fit(X, y)
        if self.normalization is None:
            self.normalization = np.ones((X.shape[1]))
        elif type(self.normalization) != np.ndarray:
            self.normalization = np.array(self.normalization)
        if X.shape[1] != self.normalization.shape[0]:
            raise "X.shape[1]="+X.shape[1]+" in equal self.scale.shape[0]="+self.normalization.shape[0]
    def transform(self, X, copy=None):
        #print(type(self),".transform(",X.shape,",",copy,").self.normalization", self.normalization)
        Xresult = super().transform(X, copy)
        Xresult *= self.normalization
        return Xresult
    def _reset(self):
        #print(type(self),"._reset()")
        super()._reset()
    
scaler = StandardAndPoorScaler(normalization = feature_importances['importance'])
scaler.fit(X = X_train, y = None)
print(scaler.normalization)

尝试申请该课程

%%time
knn_pipe = Pipeline([('scaler', StandardAndPoorScaler()), ('knn', KNeighborsClassifier(n_jobs=-1))])

knn_params = {'knn__n_neighbors': range(1, 11, 4), 'scaler__normalization': [feature_importances['importance']]}
knn_grid = GridSearchCV(knn_pipe, knn_params, cv=5, n_jobs=-1, verbose=False)

knn_grid.fit(X_train, y_train)
knn_grid.best_params_, knn_grid.best_score_, accuracy_score(y_holdout, knn_grid.predict(X_holdout))

（{'knn__n_neighbors'：5，'scaler__normalization'：名称：重要性，dtype：float64}，0.909558508358337，0.913）

结果与我的预期略有不同。好吧，也就是说，原则上一切正常。为了了解这一点，我不得不在三个小时内从头开始重制所有此类内容，然后才意识到打印不打印并不是因为sklearn制作错误，而是因为GridSearchCV在主流中创建了克隆，但在其他线程中进行配置和训练。您在其他流中打印的所有内容都将消失。但是，如果将n_jobs = 1，则所有对覆盖函数的调用都显示为可爱。知识的产生非常昂贵，现在您也拥有了，并且您通过阅读乏味的文章来为之付出了代价。

好吧，继续前进。现在，我想为它们的每个参数提供一些方差，然后在最佳值附近给它少一些，依此类推，直到得到与实际相似的结果。这将是最终实现我的梦想算法的第一个粗鲁的基准。

我将形成几个用于加权的选项，不同的是几个参数

feature_base = feature_importances['importance']
searchArea = np.array([feature_base - .05, feature_base, feature_base + .05])
searchArea[searchArea < 0] = 0
searchArea[searchArea > 1] = 1
print(searchArea[2,:] - searchArea[0,:])

import itertools

affected_props = [2,3,4]
parametrs_ranges = np.concatenate([
    np.linspace(searchArea[0,affected_props], searchArea[1,affected_props], 2, endpoint=False),
    np.linspace(searchArea[1,affected_props], searchArea[2,affected_props], 3, endpoint=True)]).transpose()

print(parametrs_ranges) #      .  125 
recombinations = itertools.product(parametrs_ranges[0],parametrs_ranges[1],parametrs_ranges[1])

variances = []
for item in recombinations: #          ,       Python .
    varince = feature_base.copy()
    varince[affected_props] = item
    variances.append(varince)
print(variances[0])
print(len(variances))
#  knn   ,               .

好了，第一个实验的数据集已经准备好了。现在，我将尝试对数据进行实验，首先对结果15个选项进行详尽搜索。

我们像本文中那样尝试选择参数

%%time
#scale = np.ones([18])
knn_pipe = Pipeline([('scaler', StandardAndPoorScaler()), ('knn', KNeighborsClassifier(n_neighbors = 7 , n_jobs=-1))])

knn_params = {'scaler__normalization': variances} # 'knn__n_neighbors': range(3, 9, 2), 
knn_grid = GridSearchCV(knn_pipe, knn_params, cv=10, n_jobs=-1, verbose=False)

knn_grid.fit(X_train, y_train)
knn_grid.best_params_, knn_grid.best_score_, accuracy_score(y_holdout, knn_grid.predict(X_holdout))

好吧，一切都不好，时间花在了突破上，结果非常不稳定。从X_holdout检查中也可以看出，结果像在万花筒中那样跳舞，对输入数据的更改很小。我将尝试另一种方法。我一次只更改一个参数，但离散化要大得多。

我改变一个第四属性

%%time
affected_property = 4
parametrs_range = np.concatenate([
    np.linspace(searchArea[0,affected_property], searchArea[1,affected_property], 29, endpoint=False),
    np.linspace(searchArea[1,affected_property], searchArea[2,affected_property], 30, endpoint=True)]).transpose()

print(searchArea[1,affected_property])
print(parametrs_range) # C   ,  .


variances = []
for item in parametrs_range: #          ,       Python .
    varince = feature_base.copy()
    varince[affected_property] = item
    variances.append(varince)
print(variances[0])
print(len(variances))
#  knn   ,               .

knn_pipe = Pipeline([('scaler', StandardAndPoorScaler()), ('knn', KNeighborsClassifier(n_neighbors = 7 , n_jobs=-1))])

knn_params = {'scaler__normalization': variances} # 'knn__n_neighbors': range(3, 9, 2), 
knn_grid = GridSearchCV(knn_pipe, knn_params, cv=10, n_jobs=-1, verbose=False)

knn_grid.fit(X_train, y_train)
knn_grid.best_params_, knn_grid.best_score_, accuracy_score(y_holdout, knn_grid.predict(X_holdout))

（{'scaler__normalization'：4 0.079957名称：重要性，dtype：float64}，0.9099871410201458，0.913）

那么，鹅有什么呢？交叉验证时，百分之一到十分之二的偏移量，如果您查看不同的受影响的属性，则X_holdout的偏移量为半个百分点。显然，如果您从树为我们提供这样的数据这一事实开始，改善这种情况是必不可少且廉价的。但是，假设我们没有一个初始的，已知的权重分布，并尝试以微小的步长在循环中的任意点进行相同的操作。我们将会很有趣。

初始填充

searchArea = np.array([np.zeros((18,)), np.ones((18,)) /18, np.ones((18,))])
print(searchArea[:,0])

history_parametrs = [searchArea[1,:].copy()]
scaler = StandardAndPoorScaler(normalization=searchArea[1,:])
scaler.fit(X_train)
knn = KNeighborsClassifier(n_neighbors = 7 , n_jobs=-1)
knn.fit(scaler.transform(X_train), y_train)
history_holdout_score = [accuracy_score(y_holdout, knn.predict(scaler.transform(X_holdout)))]

略微更改一个参数的功能（带有调试日志）

%%time
def changePropertyNormalization(affected_property, points_count = 15):
    test_range = np.concatenate([
        np.linspace(searchArea[0,affected_property], searchArea[1,affected_property], points_count//2, endpoint=False),
        np.linspace(searchArea[1,affected_property], searchArea[2,affected_property], points_count//2 + 1, endpoint=True)]).transpose()
    variances = [searchArea[1,:].copy() for i in range(test_range.shape[0])]
    for row in range(len(variances)):
        variances[row][affected_property] = test_range[row]
    
    knn_pipe = Pipeline([('scaler', StandardAndPoorScaler()), ('knn', KNeighborsClassifier(n_neighbors = 7 , n_jobs=-1))])
    knn_params = {'scaler__normalization': variances} # 'knn__n_neighbors': range(3, 9, 2), 
    knn_grid = GridSearchCV(knn_pipe, knn_params, cv=10, n_jobs=-1, verbose=False)

    knn_grid.fit(X_train, y_train)
    holdout_score = accuracy_score(y_holdout, knn_grid.predict(X_holdout))
    best_param = knn_grid.best_params_['scaler__normalization'][affected_property]
    print(affected_property,
          'property:', searchArea[1, affected_property], "=>", best_param,
          'holdout:', history_holdout_score[-1], "=>", holdout_score, '(', knn_grid.best_score_, ')')
    #             .
    before = searchArea[:, affected_property]
    propertySearchArea = searchArea[:, affected_property].copy()
    if best_param == propertySearchArea[0]:
        print('|<<')
        searchArea[0, affected_property] = best_param/2 if best_param > 0.01 else 0
        searchArea[2, affected_property] = (best_param + searchArea[2, affected_property])/2
        searchArea[1, affected_property] = best_param
    elif best_param == propertySearchArea[2]:
        print('>>|')
        searchArea[2, affected_property] = (best_param + 1)/2 if best_param < 0.99 else 1
        searchArea[0, affected_property] = (best_param + searchArea[0, affected_property])/2
        searchArea[1, affected_property] = best_param
    elif best_param < (propertySearchArea[0] + propertySearchArea[1])/2:
        print('<<')
        searchArea[0, affected_property] = max(propertySearchArea[0]*1.1 - .1*propertySearchArea[1], 0)
        searchArea[2, affected_property] = (best_param + propertySearchArea[2])/2
        searchArea[1, affected_property] = best_param
    elif best_param > (propertySearchArea[1] + propertySearchArea[2])/2:
        print('>>')
        searchArea[0, affected_property] = (best_param + propertySearchArea[0])/2
        searchArea[2, affected_property] = min(propertySearchArea[2]*1.1 - .1*propertySearchArea[1], 1)
        searchArea[1, affected_property] = best_param
    elif best_param < propertySearchArea[1]:
        print('<')
        searchArea[2, affected_property] = searchArea[1, affected_property]*.25 + .75*searchArea[2, affected_property]
        searchArea[1, affected_property] = best_param
    elif best_param > propertySearchArea[1]:
        print('>')
        searchArea[0, affected_property] = searchArea[1, affected_property]*.25 + .75*searchArea[0, affected_property]
        searchArea[1, affected_property] = best_param
    else:
        print('=')
        searchArea[0, affected_property] = searchArea[1, affected_property]*.25 + .75*searchArea[0, affected_property]
        searchArea[2, affected_property] = searchArea[1, affected_property]*.25 + .75*searchArea[2, affected_property]
    normalization = searchArea[1,:].sum() #,      .
    searchArea[:,:] /= normalization
    print(before, "=>",searchArea[:, affected_property])
    history_parametrs.append(searchArea[1,:].copy())
    history_holdout_score.append(holdout_score)
    
changePropertyNormalization(1, 9)
changePropertyNormalization(1, 9)

我没有在任何地方进行任何优化，结果，我进行了将近半小时的决定性步骤：

隐藏文字

40 .

%%time
#   
searchArea = np.array([np.zeros((18,)), np.ones((18,)) /18, np.ones((18,))])
print(searchArea[:,0])

history_parametrs = [searchArea[1,:].copy()]
scaler = StandardAndPoorScaler(normalization=searchArea[1,:])
scaler.fit(X_train)
knn = KNeighborsClassifier(n_neighbors = 7 , n_jobs=-1)
knn.fit(scaler.transform(X_train), y_train)
history_holdout_score = [accuracy_score(y_holdout, knn.predict(scaler.transform(X_holdout)))]

for tick in range(40):
    for p in range(searchArea.shape[1]):
        changePropertyNormalization(p, 7)
    
print(searchArea[1,:])
print(history_holdout_score)

knn的最终精度为：91.9％，比我们从树中删除数据时要好。而且比原始版本要好得多。根据决策树比较我们对功能的重要性：

根据knn可视化功能的重要性

feature_importances['knn_importance'] = history_parametrs[-1]
diagramma = feature_importances.copy()
indexes = diagramma.index
diagramma.index = diagramma['features']
diagramma.drop('features', 1, inplace = True)
diagramma.plot(kind='bar');
plt.savefig("images/pic1.png", format = 'png')
plt.show()
feature_importances

似乎是？是的，看来。但是远非相同。有趣的观察。数据集中有几个功能可以完全重复，例如“总夜间时间”和“总夜间费用”。因此请注意，knn本身发现了此类重复功能的重要部分。

我们会将结果保存到文件中，否则恢复工作有点不便...。

parametrs_df = pd.DataFrame(history_parametrs)
parametrs_df['scores'] = history_holdout_score
parametrs_df.index.name = 'index'
parametrs_df.to_csv('parametrs_and_scores.csv')

发现

好吧，结果.919本身对knn来说还不错，错误率比普通版本少了一半半，比使用feature_importance树驱动时少了7％。但是最有趣的是，根据knn本身，现在我们有了feature_importance。它和树告诉我们的有些不同。例如，tree和knn对于哪个符号对我们根本不重要有不同的看法。

好吧，最后。我们有了一些相对较新的和不寻常的东西，对三门讲座mlcourse.ai

ods和Google 都有一定的了解，可以回答有关python的简单问题。我认为还不错。

现在幻灯片

算法工作的副产品是它所经过的路径。但是，该路径是18维的，这有点妨碍了他的意识，无法实时跟踪算法在此处执行的操作，学习或使用垃圾并不那么方便。根据错误时间表，这实际上并不总是可见的。该错误可能不会在很长一段时间内发生明显变化，但是该算法非常繁忙，会沿着自适应空间中的狭长山谷爬行。因此，对于初学者来说，我将使用第一种最简单但非常有用的方法-我将18维空间随机投影到二维空间上，以便所有参数的贡献（无论其重要性如何）都是单一的。实际上，在我们的文章《窥视神经网络》中，18维路径很小。我同样钦佩神经网络具有的所有突触的音阶空间，它很好并且提供了很多信息。

我已通过培训阶段，如果我恢复工作，则从文件中读取数据

parametrs_df = pd.read_csv('parametrs_and_scores.csv', index_col = 'index')
history_holdout_score = np.array(parametrs_df['scores'])
parametrs_df.drop('scores',axis=1)
history_parametrs = np.array(parametrs_df.drop('scores',axis=1))

验证错误从某点开始不再改变。在这里，有可能在我的余生中自动停止学习并使用接收到的功能，但是我已经有一点时间了。:(

我们决定学习多少。

last = history_holdout_score[-1]
steps = np.arange(0, history_holdout_score.shape[0])[history_holdout_score != last].max()
print(steps/18)

35.5555555555555556
我们一次更改了一个参数，因此一个优化周期包括18个步骤。事实证明，我们有36个有意义的步骤或类似的步骤。现在，让我们尝试可视化训练该方法的轨迹。

隐藏文字

%%time
#    :
import matplotlib.pyplot as plt
%matplotlib inline
import random
import math
random.seed(17)
property_projection = np.array([[math.sin(a), math.cos(a)] for a in [random.uniform(-math.pi, math.pi) for i in range(history_parametrs[0].shape[0])]]).transpose()
history = np.array(history_parametrs[::18]) #   - 18 .
#           . :(
points = np.array([(history[i] * property_projection).sum(axis=1) for i in range(history.shape[0])])
plt.plot(points[:36,0],points[0:36,1]);
plt.savefig("images/pic2.png", format = 'png')
plt.show()

可以看出，旅程的很大一部分是在前四个步骤中完成的。让我们看看其他的增加方式

没有前4分

plt.plot(points[4:36,0],points[4:36,1]);
plt.savefig("images/pic3.png", format = 'png')

让我们仔细研究一下路径的最后部分，看看老师到达目的地后做了什么。

越来越近

plt.plot(points[14:36,0],points[14:36,1]);
plt.savefig("images/pic4.png", format = 'png')
plt.show()
plt.plot(points[24:36,0],points[24:36,1]);
plt.plot(points[35:,0],points[35:,1], color = 'red');
plt.savefig("images/pic5.png", format = 'png')
plt.show()

可以看出，该算法正在专心训练。直到他找到目的地。当然，具体点取决于交叉验证中的随机性。但是，无论具体点如何，都可以理解所发生事情的总体情况。

顺便说一句，我曾经使用这样的时间表来演示学习过程。
并未显示整个轨迹，而是显示了比例尺滑动平滑的最后几步。在我的另一篇文章“我们窥探神经网络”中可以找到一个例子。是的，当然，遇到这种可视化的每个人都会立即问为什么所有因素都具有相同的权重，重要性，而后又有所不同。上一次，我试图重新衡量突触的重要性，结果却没有那么多信息。

这次，借助新知识，我将尝试使用t-SNE将多维空间部署到投影中，使一切都可以变得更好。

吨位

%%time
import sklearn.manifold as manifold
tsne = manifold.TSNE(random_state=19)
tsne_representation = tsne.fit_transform(history)
plt.plot(tsne_representation[:, 0], tsne_representation[:, 1])
plt.savefig("images/pic6.png", format = 'png')
plt.show();

t-Sne似乎已经扩展了空间，因此对于那些迅速停止变化的功能，它完全可以控制变化的范围，从而使图像完全无用。结论-不要试图将算法放到不适合他们使用的地方。

您无法进一步阅读

我还尝试将tsne注入内部以可视化中间优化状态，以期能产生美感。但是事实证明，这不是美丽，是一些垃圾。如果有兴趣，请参阅操作方法。互联网上充斥着此类注入代码的示例，但仅复制它们就不会造成麻烦，因为sklearn.manifold.t_sne内部函数_gradient_descent中包含替代项，并且取决于版本，其签名和内部变量的处理可能会非常不同。因此，只需找到自己的源代码，然后从那里选择函数的版本，然后在其中插入一行，即可在您

自己的变量中添加中间转储：position.append（p.copy（））＃我们保存当前位置。

然后，例如，我们可以精美地可视化所得到的结果：

注射代码

from time import time
from scipy import linalg
# This list will contain the positions of the map points at every iteration.
positions = []
def _gradient_descent(objective, p0, it, n_iter,
                      n_iter_check=1, n_iter_without_progress=300,
                      momentum=0.8, learning_rate=200.0, min_gain=0.01,
                      min_grad_norm=1e-7, verbose=0, args=None, kwargs=None):
    # The documentation of this function can be found in scikit-learn's code.
    if args is None:
        args = []
    if kwargs is None:
        kwargs = {}

    p = p0.copy().ravel()
    update = np.zeros_like(p)
    gains = np.ones_like(p)
    error = np.finfo(np.float).max
    best_error = np.finfo(np.float).max
    best_iter = i = it

    tic = time()
    for i in range(it, n_iter):
        positions.append(p.copy()) # We save the current position.
        
        check_convergence = (i + 1) % n_iter_check == 0
        # only compute the error when needed
        kwargs['compute_error'] = check_convergence or i == n_iter - 1

        error, grad = objective(p, *args, **kwargs)
        grad_norm = linalg.norm(grad)

        inc = update * grad < 0.0
        dec = np.invert(inc)
        gains[inc] += 0.2
        gains[dec] *= 0.8
        np.clip(gains, min_gain, np.inf, out=gains)
        grad *= gains
        update = momentum * update - learning_rate * grad
        p += update

        if check_convergence:
            toc = time()
            duration = toc - tic
            tic = toc

            if verbose >= 2:
                print("[t-SNE] Iteration %d: error = %.7f,"
                      " gradient norm = %.7f"
                      " (%s iterations in %0.3fs)"
                      % (i + 1, error, grad_norm, n_iter_check, duration))

            if error < best_error:
                best_error = error
                best_iter = i
            elif i - best_iter > n_iter_without_progress:
                if verbose >= 2:
                    print("[t-SNE] Iteration %d: did not make any progress "
                          "during the last %d episodes. Finished."
                          % (i + 1, n_iter_without_progress))
                break
            if grad_norm <= min_grad_norm:
                if verbose >= 2:
                    print("[t-SNE] Iteration %d: gradient norm %f. Finished."
                          % (i + 1, grad_norm))
                break

    return p, error, i

manifold.t_sne._gradient_descent = _gradient_descent

应用``固定''t-SNE

tsne_representation = manifold.TSNE(random_state=17).fit_transform(history)
X_iter = np.dstack(position.reshape(-1, 2) for position in positions)
position_reshape = [position.reshape(-1, 2) for position in positions]
print(position_reshape[0].shape)
print('[0] min', position_reshape[0][:,0].min(),'max', position_reshape[0][:,0].max())
print('[1] min', position_reshape[1][:,0].min(),'max', position_reshape[1][:,0].max())
print('[2] min', position_reshape[2][:,0].min(),'max', position_reshape[2][:,0].max())

（
41，2 ）[0]最小值-0.00018188123最大值0.00027207955
[1]最小值-0.05136269最大值0.032607622
[2]最小值-4.392309最大值7.9074526
该值在非常宽的范围内变化，因此在绘制它们之前将对其进行缩放。在循环中，所有这些都是缓慢完成的。:(

我缩放

%%time
from sklearn.preprocessing import MinMaxScaler
minMaxScaler = MinMaxScaler()
minMaxScaler.fit_transform(position_reshape[0])
position_reshape = [minMaxScaler.fit_transform(frame) for frame in position_reshape]
position_reshape[0].min(), position_reshape[0].max()

动画化

%%time

from matplotlib.animation import FuncAnimation, PillowWriter
#plt.style.use('seaborn-pastel')

fig = plt.figure()

ax = plt.axes(xlim=(0, 1), ylim=(0, 1))
line, = ax.plot([], [], lw=3)

def init():
    line.set_data([], [])
    return line,
def animate(i):
    x = position_reshape[i][:,0]
    y = position_reshape[i][:,1]
    line.set_data(x, y)
    return line,

anim = FuncAnimation(fig, animate, init_func=init, frames=36, interval=20, blit=True, repeat_delay = 1000)
anim.save('images/animate_tsne_learning.gif', writer=PillowWriter(fps=5))

它在技巧上很有启发性，但在这项任务和丑陋中绝对没有用。

为此，我向你告别。我希望即使在knn的帮助下，您也可以得到一些新颖有趣的东西以及一些代码，这一想法也将帮助您如何在这场瘟疫中的智力大餐中玩转数据。

通过类似于梯度的下降来选择k近邻（井或其他超参数）的特征重要性

问题的提法

梯度搜索

发现

现在幻灯片

您无法进一步阅读

More articles: