t-SNE

Don't look back. Don't hesitate, just do it.

t-SNE原理

from here.

1. tsne is strictly used for visualization. and we can only see things in up to 3 dimensions.

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensionality reduction technique used to represent high-dimensional dataset in a low-dimensional space of two or three dimensions so that we can visualize it. In contrast to other dimensionality reduction algorithms like PCA which simply maximizes the variance, t-SNE creates a reduced feature space where similar samples are modeled by nearby points and dissimilar samples are modeled by distant points with high probability.

At a high level, t-SNE constructs a probability distribution for the high-dimensional samples in such a way that similar samples have a high likelihood of being picked while dissimilar points have an extremely small likelihood of being picked. Then, t-SNE defines a similar distribution for the points in the low-dimensional embedding. Finally, t-SNE minimizes the Kullback–Leibler divergence between the two distributions with respect to the locations of the points in the embedding.

>>> import numpy as np
>>> from sklearn.manifold import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(n_components=2).fit_transform(X)
>>> X_embedded.shape
(4, 2)

Return:

X_new: array, shape (n_samples, n_components)

Embedding of the training data in low-dimensional space.

1. why a distance matrix is a high dimension? I originally thought it was just two dimensions.

将矩阵中的每一列当作一维?

从SNE到t-SNE再到LargeVis

tSNEJS

SNE

假设：在高维空间相似的数据点，映射到低维空间距离也是相似的。

常规的做法是用欧式距离表示这种相似性，而SNE把这种距离关系转换为一种条件概率来表示相似性。P_i 表示高维空间中，x_i与其他所有点之间的条件概率。同理在低维空间存在一个条件概率分布 $Q_{i}$

$Q_{i}$

t-SNE

在原始的SNE中， $p_{i ∣ j}$

解决了SNE中的不对称问题，得到了一个更为简单的梯度公式，但是Maaten指出，对称SNE的效果只是略微优于原始SNE的效果，依然没有从根本上解决问题。

t distribution解决了展示中的拥挤问题（the crowding problem）。像 $t$

$t$

Supplementary knowledge:

1. what is manifold learning流形学习?

2. what does tsne do? only dimension reduction or help identify clusters.

3. the output of sklearn.manifold.TSNE is 2-dimensional dataframe. how to deal with the output matrix?

4. 如何衡量两个分布之间的相似性？ KL divergence...

当然是经典的KL距离。

The KL divergence is a measure of how different one probability distribution from a second. The lower the value of the KL divergence, the closer two distributions are to one another. A KL divergence of 0 implies that the two distributions in question are identical.

5. t distribution VS Gaussian distribution

由于高斯分布的尾部较低，对异常点比较敏感，为了照顾这些异常点，高斯分布的拟合结果偏离了大多数样本所在位置，方差也较大。相比之下， $t$

$t$

The current API is that:

flatten always returns a copy.
ravel returns a view of the original array whenever possible. This isn't visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don't always get a contiguous array.