What is the difference between OneVsRestClassifier and MultiOutputClassifier in scikit learn?

Multiclass classification

To better illustrate the differences, let us assume that your goal is that of classifying SO questions into n_classes different, mutually exclusive classes. For the sake of simplicity in this example we will only consider four classes, namely 'Python', 'Java', 'C++' and 'Other language'. Let us assume that you have a dataset formed by just six SO questions, and the class labels of those questions are stored in an array y as follows:

import numpy as np
y = np.asarray(['Java', 'C++', 'Other language', 'Python', 'C++', 'Python'])

The situation described above is usually referred to as multiclass classification (also known as multinomial classification). In order to fit the classifier and validate the model through scikit-learn library you need to transform the text class labels into numerical labels. To accomplish that you could use LabelEncoder:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_numeric = le.fit_transform(y)

This is how the labels of your dataset are encoded:

In [220]: y_numeric
Out[220]: array([1, 0, 2, 3, 0, 3], dtype=int64)

where those numbers denote indices of the following array:

In [221]: le.classes_
Out[221]: 
array(['C++', 'Java', 'Other language', 'Python'], 
      dtype='|S14')

An important particular case is when there are just two classes, i.e. n_classes = 2. This is usually called binary classification.

Multilabel classification

Let us now suppose that you wish to perform such multiclass classification using a pool of n_classes binary classifiers, being n_classes the number of different classes. Each of these binary classifiers makes a decision on whether an item is of a specific class or not. In this case you cannot encode class labels as integer numbers from 0 to n_classes - 1, you need to create a 2-dimensional indicator matrix instead. Consider that sample n is of class k. Then, the [n, k] entry of the indicator matrix is 1 and the rest of the elements in row n are 0. It is important to note that if the classes are not mutually exclusive there can be multiple 1's in a row. This approach is named multilabel classification and can be easily implemented through MultiLabelBinarizer:

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
y_indicator = mlb.fit_transform(y[:, None])

The indicator looks like this:

In [225]: y_indicator
Out[225]: 
array([[0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 1, 0],
       [0, 0, 0, 1],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

and the column numbers where 1's are actually indices of this array:

In [226]: mlb.classes_
Out[226]: array(['C++', 'Java', 'Other language', 'Python'], dtype=object)

Multioutput classification

What if you want to classify a particular SO question according to two different criteria simultaneously, for instance language and application? In this case you intend to do multioutput classification. For the sake of simplicity I will consider only three application classes, namely 'Computer Vision', 'Speech Processing' and 'Other application'. The label array of your dataset should be 2-dimensional:

y2 = np.asarray([['Java', 'Computer Vision'],
                 ['C++', 'Speech Recognition'],
                 ['Other language', 'Computer Vision'],
                 ['Python', 'Other Application'],
                 ['C++', 'Speech Recognition'],
                 ['Python', 'Computer Vision']])

Again, we need to transform text class labels into numeric labels. As far as I know this functionality is not implemented in scikit-learn yet, so you will need to write your own code. This thread describes some clever ways to do that, but for the purposes of this post the following one-liner should suffice:

y_multi = np.vstack((le.fit_transform(y2[:, i]) for i in range(y2.shape[1]))).T

The encoded labels look like this:

In [229]: y_multi
Out[229]: 
array([[1, 0],
       [0, 2],
       [2, 0],
       [3, 1],
       [0, 2],
       [3, 0]], dtype=int64)

And the meaning of the values in each column can be inferred from the following arrays:

In [230]: le.fit(y2[:, 0]).classes_
Out[230]: 
array(['C++', 'Java', 'Other language', 'Python'], 
      dtype='|S18')

In [231]: le.fit(y2[:, 1]).classes_
Out[231]: 
array(['Computer Vision', 'Other Application', 'Speech Recognition'], 
      dtype='|S18')