How can I stitch images from video cameras in real time?

Note: I leave this answer just as documentation of what was tried, as the method I suggested doesn't seem to work, while apparently the GPU is already in use when using cv::Mat.


Try using gpu::GpuMat:

std::vector<cv::Mat> images(4);
std::vector<gpu::GpuMat> gpuImages(4);
gpu::GpuMat pano_result_gpu;
cv::Mat pano_result; 
bool firstTime = true;

[...] 

cap0 >> images[0];
cap1 >> images[1];
cap2 >> images[2];
cap3 >> images[3];
for (int i = 0; i < 4; i++)
   gpuImages[i].upload(images[i]);
if (firstTime) {
    cv::Stitcher::Status status = stitcher.estimateTransform(gpuImages);
    firstTime = false;
    }
cv::Stitcher::Status status = stitcher.composePanorama(gpuImages, pano_result_gpu);
pano_result_gpu.download(pano_result);

cv::Stitcher is fairly slow. If your cameras definitely don't move relative to one another and the transformation is as simple as you say, you should be able to overlay the images onto a blank canvas simply by chaining homographies.

The following is somewhat mathematical - if this isn't clear I can write it up properly using LaTeX, but SO doesn't support pretty maths :)

You have a set of 4 cameras, from left to right, (C_1, C_2, C_3, C_4), giving a set of 4 images (I_1, I_2, I_3, I_4).

To transform from I_1 to I_2, you have a 3x3 transformation matrix, called a homography. We'll call this H_12. Similarly for I_2 to I_3 we have H_23 and for I_3 to I_4 you'll have H_34.

You can pre-calibrate these homographies in advance using the standard method (point matching between the overlapping cameras).

You'll need to create a blank matrix, to act as the canvas. You can guess the size of this (4*image_size would suffice) or you can take the top-right corner (call this P1_tr) and transform it by the three homographies, giving a new point at the top-right of the panorama, PP_tr (the following assumes that P1_tr has been converted to a matrix):

PP_tr = H_34 * H_23 * H_12 * P1_tr'

What this is doing, is taking P1_tr and transforming it first into camera 2, then from C_2 to C_3 and finally from C_3 to C_4

You'll need to create one of these for combining images 1 and 2, images 1,2 and 3 and finally images 1-4, I'll refer to them as V_12, V_123 and V_1234 respectively.

Use the following to warp the image onto the canvas:

cv::warpAffine(I_2, V_12, H_12, V_12.size( ));

Then do the same with the next images:

cv::warpAffine(I_3, V_123, H_23*H_12, V_123.size( ));
cv::warpAffine(I_4, V_1234, H_34*H_23*H_12, V_1234.size( ));

Now you have four canvases, all of which are the width of the 4 combined images, and with one of the images transformed into the relevant place on each.

All that remains is to merge the transformed images onto eachother. This is easily achieved using regions of interest.

Creating the ROI masks can be done in advance, before frame capture begins.

Start with a blank (zeros) image the same size as your canvases will be. Set the leftmost rectangle the size of I_1 to white. This is the mask for your first image. We'll call it M_1.

Next, to get the mask for the second transformed image, we do

cv::warpAffine(M_1, M_2, H_12, M_1.size( ));
cv::warpAffine(M_2, M_3, H_23*H_12, M_1.size( ));
cv::warpAffine(M_3, M_4, H_34*H_23*H_12, M_1.size( ));

To bring all the images together into one panorama, you do:

cv::Mat pano = zeros(M_1.size( ), CV_8UC3);
I_1.copyTo(pano, M_1);
V_12.copyTo(pano, M_2): 
V_123.copyTo(pano, M_3): 
V_1234.copyTo(pano, M_4): 

What you're doing here is copying the relevant area of each canvas onto the output image, pano - a fast operation.

You should be able to do all this on the GPU, substituting cv::gpu::Mat's for cv::Mats and cv::gpu::warpAffine for its non-GPU counterpart.