OpenCV - Feature Matching vs Optical Flow

I would like to add a few thoughts about that theme since I found this a very interesting question too. As said before Feature Matching is a technique that is based on:

  • A feature detection step which returns a set of so called feature points. These feature points are located at positions with salient image structures, e.g. edge-like structures when you are using FAST or blob like structures if you are using SIFT or SURF.

  • The second step is the matching. The association of feature points extracted from two different images. The matching is based on local visual descriptors, e.g. histogram of gradients or binary patterns, that are locally extracted around the feature positions. The descriptor is a feature vector and associated feature point pairs are pairs a minimal feature vector distances.

Most feature matching methods are scale and rotation invariant and are robust for changes in illuminations (e.g caused by shadow or different contrast). Thus these methods can be applied to image sequences but are more often used to align image pairs captured from different views or with different devices.The disadvantage of Feature Matching methods is the difficulty of defining where the feature matches are spawn and that the feature pair (which in a image sequence are motion vectors) are in general very sparse. In addition the subpixel accuracy of matching approaches are very limited as most detector are fine-graded to integer positions.

From my experience the main advantage of feature matching approaches is that they can compute very large motions/ displacements.

OpenCV offers some feature matching methods but there are a lot of more recent, faster and more accurate approaches available online e.g.:

  • DeepMatching which relies on deep learning and are often used to initialize optical flow methods to help them deal with long-range motions.
  • Stereoscann which is a very fast approach at its origin proposed for visual odometry.

Optical flow methods in contrast rely on the minimization of the brightness constancy and additional constrain e.g. smoothness etc. Thus they derive motion vector based on spatial and temporal image gradients of a sequence of consecutive frames. Thus they are more suited image sequences rather than image pairs that are captured from very different view points. The main challenges in the estimation of motion with optical flow vectors are large motions, occlusion, strong illumination changes and changes of the appearance of the objects and mostly the low runtime. However optical flow methods can be highly accurate and compute dense motion fields which respect to shared motion boundaries of the objects in a scene.

However, the accuracy of different optical flow methods is very different. Local methods such as the PLK (Lucas Kanade) are in general less accurate but allow to compute pre selected motion vectors only and can thus be very fast. (In the recent years we have done some research to improve the accuracy of the local approach, see here for further information).

The main OpenCV trunk offers global approaches such as the Farnback. But this is a quite outdated approach. Try the OpenCV contrib trunk which more recent methods. But to get an good overview of the most recent methods take a look at the public optical flow benchmarks. Here you will find code and implementations as well e.g.:

  • MPI-Sintel optical flow benchmark
  • KITTI 2012 optical flow benchmark. Both offer links e.g. to git's or source code for some newer methods. Such as FlowFields.

But from my point of view I would not on an early stage reject a specific approach matching or optical flow. Try as much as possible available online implementations and see what is the best for your application.


Feature matching uses the feature descriptors to match features with one another (usually) using a nearest neighbor search in the feature descriptor space. The basic idea is you have descriptor vectors, and the same feature in two images should be near each other in the descriptor space, so you just match that way.

Optical flow algorithms do not look at a descriptor space, and instead, looks at pixel patches around features and tries to match those patches instead. If you're familiar with dense optical flow, sparse optical flow just does dense optical flow but on small patches of the image around feature points. Thus optical flow assumes brightness constancy, that is, that pixel brightness doesn't change between frames. Also, since you're looking around neighboring pixels, you need to make the assumption that neighboring points to your features move similarly to your feature. Finally, since it's using a dense flow algorithm on small patches, the points where they move cannot be very far in the image from the original feature location. If they are, then the pyramid-resolution approach is recommended, where you scale down the image before you do this so that what once was a 16 pixel translation is now a 2 pixel translation, and then you can scale up with the found transformation as your prior.

So feature matching algorithms are all-in-all far better when it comes to using templates where the scale is not exactly the same, or if there's a perspective difference in the image and template, or if the transformations are large. However, your matches are only as good as your feature detector is exact. On optical flow algorithms, as long as it's looking in the right spot, the transformations can be really, really precise. They're both computationally expensive a bit; optical flow algorithms being an iterative approach makes them expensive (and although you'd think the pyramid approach can eat up more costs by running on more images, it can actually make it faster in some cases to reach the desired accuracy), and nearest neighbor searches are also expensive. Optical flow algorithms OTOH can work really well when the transformations are small, but if anything in your scene messes with your lighting or you get some incorrect pixels (like say, even minor occlusion) can really throw it off.

Which one to use definitely depends on the project. For a project I worked on with satellite imagery, I used dense optical flow because the images of desert terrain I was working with did not have precise enough features (in location) and different feature descriptors happen to look relatively similar so searching that feature space wasn't giving tons of great matches. In this case, optical flow was the better method. However, if you were doing image alignment on satellite imagery of a city where buildings can occlude parts of the scene, there are a lot of features that will stay matched and give a better result.

The OpenCV Lucas-Kanade tutorial doesn't give a whole lot of insight but should get your code moving in the right direction with the above in mind.