Distinction between pixel-based and object based classification?

As far as pixel-based classification is concerned, you are spot on. Each pixel is an n-dimensional vector and will be assigned to some class according to some metric, whether using Support Vector Machines, MLE, some kind of knn classifier, etc.

As far as region based classifiers are concerned, though, there have been huge developments in the last few years, driven by a combination of GPUs, vast amounts of data, the cloud and wide availability of algorithms thanks to the growth of open source (facilitated by github). One of the biggest developments in computer vision/classification has been in convolutional neural networks (CNNs). The convolutional layers "learn" features which might be based on colour, as with traditional pixel-based classifiers, but also create edge detectors and all kinds of other feature extractors that could exist in an region of pixels (hence the convolutional part) that you could never extract from a pixel-based classification. This means they are less likely to mis-classify a pixel in the middle of an area of pixels of some other type -- if you have ever run a classification and got ice in the middle of the Amazon, you will understand this problem.

You then apply a fully connected neural net to the "features" learnt via the convolutions to actually do the classification. One of the other great advantages of CNNs is that they are scale and rotation invariant, as there are usually intermediate layers between the convolution layers and the classification layer that generalize features, using pooling and dropout, to avoid overfitting, and help with the issues around scale and orientation.

There are numerous resources on convolutional neural networks, although the best has to be the Standord class from Andrei Karpathy, who is one of the pioneers of this field, and the entire lecture series is available on youtube.

Sure, there are other ways of dealing with pixel versus area based classification, but this is currently the state of the art approach, and has many applications beyond remote sensing classification, such as machine translation and self-driving cars.

Here is another example of region-based classification, using Open Street Map for tagged training data, including instructions for setting up TensorFlow and running on AWS.

Here is an example using Google Earth Engine of a classifier based on edge detection, in this case for pivot irrigation -- using nothing more than a Gaussian kernel and convolutions, but again, showing the power of region/edge based approaches.

enter image description here

While the superiority of object over pixel-based classfication is fairly widely accepted, here is an interesting article in Remote Sensing Letters assessing the performance of object-based classification.

Finally, an amusing example, just to show that even with regional/convolutional based classifiers, computer vision is still really hard -- fortunately, the smartest people at Google, Facebook, etc, are working on algorithms to be able to determine the difference between dogs, cats, and different breeds of dogs and cats. So, those of use interested in remote sensing can sleep easy at night :D

enter image description here


Your understanding is generally correct, however, there are dangers in your description of the object based classification - the term 'object' refers to the group of pixels, not whether or not it contains a given object.
Furthermore, the central goal in a object-based classification is not to have segments of equal size, but to have "chopped"/segmented the image up in to internally homogeneous chunks of varying size. Lastly, the training example for the object-based classification would usually be one or more of the chunks created in the image segmentation.

All in all, the above is only minor variations upon your description.

Now onto the central part - when to apply each method, and how to potentially combine their strengths.