pca.inverse_transform in sklearn

When I perform inverse transformation by definition isn't it supposed to return to original data

No, you can only expect this if the number of components you specify is the same as the dimensionality of the input data. For any n_components less than this, you will get different numbers than the original dataset after applying the inverse PCA transformation: the following diagrams give an illustration in two dimensions.

enter image description here


It can not do that, since by reducing the dimensions with PCA, you've lost information (check pca.explained_variance_ratio_ for the % of information you still have). However, it tries its best to go back to the original space as well as it can, see the picture below Comparison of original points with transformed points where information is lost

(generated with

import numpy as np
from sklearn.decomposition import PCA
pca = PCA(1)
X_orig = np.random.rand(10, 2)
X_re_orig = pca.inverse_transform(pca.fit_transform(X_orig))

plt.scatter(X_orig[:, 0], X_orig[:, 1], label='Original points')
plt.scatter(X_re_orig[:, 0], X_re_orig[:, 1], label='InverseTransform')
[plt.plot([X_orig[i, 0], X_re_orig[i, 0]], [X_orig[i, 1], X_re_orig[i, 1]]) for i in range(10)]
plt.legend()
plt.show()

) If you had kept the n_dimensions the same (set pca = PCA(2), you do recover the original points (the new points are on top of the original ones):
New points on top of original points