Machine Learning: Unsupervised Backpropagation

What you might be asking is about unsupervised feature learning and deep learning.

Feature learning is the only unsupervised method I can think of with respect of NN or its recent variant.(a variant called mixture of RBM's is there analogous to mixture of gaussians but you can build a lot of models based on the two). But basically Two models I am familiar with are RBM's(restricted boltzman machines) and Autoencoders.

Autoencoders(optionally sparse activations can be encoded in optimization function) are just feedforward neural networks which tune its weights in such a way that the output is a reconstructed input. Multiple hidden layers can be used but the weight initialization uses a greedy layer wise training for better starting point. So to answer the question the target function will be input itself.

RBM's are stochastic networks usually interpreted as graphical model which has restrictions on connections. In this setting there is no output layer and the connection between input and latent layer is bidirectional like an undirected graphical model. What it tries to learn is a distribution on inputs(observed and unobserved variables). Here also your answer would be input is the target.

Mixture of RBM's(analogous to mixture of gaussians) can be used for soft clustering or KRBM(analogous to K-means) can be used for hard clustering. Which in effect feels like learning multiple non-linear subspaces.

http://deeplearning.net/tutorial/rbm.html

http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial


I'm not sure which unsupervised machine learning algorithm uses backpropagation specifically; if there is one I haven't heard of it. Can you point to an example?

Backpropagation is used to compute the derivatives of the error function for training an artificial neural network with respect to the weights in the network. It's named as such because the "errors" are "propagating" through the network "backwards". You need it in this case because the final error with respect to the target depends on a function of functions (of functions ... depending on how many layers in your ANN.) The derivatives allow you to then adjust the values to improve the error function, tempered by the learning rate (this is gradient descent).

In unsupervised algorithms, you don't need to do this. For example, in k-Means, where you are trying to minimize the mean squared error (MSE), you can minimize the error directly at each step given the assignments; no gradients needed. In other clustering models, such as a mixture of Gaussians, the expectation-maximization (EM) algorithm is much more powerful and accurate than any gradient-descent based method.


The most common thing to do is train an autoencoder, where the desired outputs are equal to the inputs. This makes the network try to learn a representation that best "compresses" the input distribution.

Here's a patent describing a different approach, where the output labels are assigned randomly and then sometimes flipped based on convergence rates. It seems weird to me, but okay.

I'm not familiar with other methods that use backpropogation for clustering or other unsupervised tasks. Clustering approaches with ANNs seem to use other algorithms (example 1, example 2).