What is the difference between supervised learning and unsupervised learning?

Since you ask this very basic question, it looks like it's worth specifying what Machine Learning itself is.

Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: a hypothetical non-machine learning algorithm for face detection in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but would "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face.

This particular example of face detection is supervised, which means that your examples must be labeled, or explicitly say which ones are faces and which ones aren't.

In an unsupervised algorithm your examples are not labeled, i.e. you don't say anything. Of course, in such a case the algorithm itself cannot "invent" what a face is, but it can try to cluster the data into different groups, e.g. it can distinguish that faces are very different from landscapes, which are very different from horses.

Since another answer mentions it (though, in an incorrect way): there are "intermediate" forms of supervision, i.e. semi-supervised and active learning. Technically, these are supervised methods in which there is some "smart" way to avoid a large number of labeled examples. In active learning, the algorithm itself decides which thing you should label (e.g. it can be pretty sure about a landscape and a horse, but it might ask you to confirm if a gorilla is indeed the picture of a face). In semi-supervised learning, there are two different algorithms which start with the labeled examples, and then "tell" each other the way they think about some large number of unlabeled data. From this "discussion" they learn.


Supervised learning is when the data you feed your algorithm with is "tagged" or "labelled", to help your logic make decisions.

Example: Bayes spam filtering, where you have to flag an item as spam to refine the results.

Unsupervised learning are types of algorithms that try to find correlations without any external inputs other than the raw data.

Example: data mining clustering algorithms.


Supervised learning

Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning problems.

Unsupervised learning

In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering

Pattern Recognition and Machine Learning (Bishop, 2006)