Which algorithms to use for one class classification?

OCC problems are closely related to anomaly detection/Novel detection. In these problems, we have only positive classes and they are generally non-Gaussian.

The main motivation for OCC is the lack of dataset available to define it as another class. Generally, one-vs-other metrics are improved for these tasks with any discriminative model.

Popular approaches are based on SVM such as one-class SVM which generally have non-flexible geometry boundary(subscribing hyper-ball) and for flexible one (without translation invariant kernel) is support vector data description (SVDD) [WIP].

So one-class SVM is a specific case of SVDD with K(x,x)=const.

For more details check here.


What you're looking for is the OneClassSvm. For more information you might want to check out the corresponding documentation at this link.


There is another classifier available in the TextBlob module called PositiveNaiveBayesClassifier. To quote from their documentation:

A variant of the Naive Bayes Classifier that performs binary classification with partially-labeled training sets, i.e. when only one class is labeled and the other is not. Assuming a prior distribution on the two labels, uses the unlabeled set to estimate the frequencies of the features.

Code Usage:

>>> from text.classifiers import PositiveNaiveBayesClassifier
>>> sports_sentences = ['The team dominated the game',
                        'They lost the ball',
                        'The game was intense',
                        'The goalkeeper catched the ball',
                        'The other team controlled the ball']
>>> various_sentences = ['The President did not comment',
                         'I lost the keys',
                         'The team won the game',
                         'Sara has two kids',
                         'The ball went off the court',
                         'They had the ball for the whole game',
                         'The show is over']
>>> classifier = PositiveNaiveBayesClassifier(positive_set=sports_sentences,
                                unlabeled_set=various_sentences)
>>> classifier.classify("My team lost the game")
True
>>> classifier.classify("And now for something completely different.")
False