OpenCV: get perspective matrix from translation & rotation

This is a sketch of what I mean by "solving the system of equations" (in Python):

import cv2
import scipy  # I use scipy by habit; numpy would be fine too
#rvec= the rotation vector
#tvec = the translation *emphasized text*matrix
#A = the camera intrinsic

def unit_vector(v):
    return v/scipy.sqrt(scipy.sum(v*v))

(fx,fy)=(A[0,0], A[1,1])
Ainv=scipy.array( [ [1.0/fx, 0.0, -A[0,2]/fx],
                     [ 0.0,  1.0/fy, -A[1,2]/fy],
                     [ 0.0,    0.0,     1.0] ], dtype=scipy.float32 )
R=cv2.Rodrigues( rvec )
Rinv=scipy.transpose( R )

u=scipy.dot( Rinv, tvec ) # displacement between camera and world coordinate origin, in world coordinates


# corners of the image, for here hard coded
pixel_corners=[ scipy.array( c, dtype=scipy.float32 ) for c in [ (0+0.5,0+0.5,1), (0+0.5,640-0.5,1), (480-0.5,640-0.5,1), (480-0.5,0+0.5,1)] ]
scene_corners=[]
for c in pixel_corners:
    lhat=scipy.dot( Rinv, scipy.dot( Ainv, c) ) #direction of the ray that the corner images, in world coordinates
    s=u[2]/lhat[2]
    # now we have the case that (s*lhat-u)[2]==0,
    # i.e. s is how far along the line of sight that we need
    # to move to get to the Z==0 plane.
    g=s*lhat-u
    scene_corners.append( (g[0], g[1]) )

# now we have: 4 pixel_corners (image coordinates), and 4 corresponding scene_coordinates
# can call cv2.getPerspectiveTransform on them and so on..

Actually there is no need to involve an orthographic camera. Here is how you can get the appropriate perspective transform.

If you calibrated the camera using cv::calibrateCamera, you obtained a camera matrix K a vector of lens distortion coefficients D for your camera and, for each image that you used, a rotation vector rvec (which you can convert to a 3x3 matrix R using cv::rodrigues, doc) and a translation vector T. Consider one of these images and the associated R and T. After you called cv::undistort using the distortion coefficients, the image will be like it was acquired by a camera of projection matrix K * [ R | T ].

Basically (as @DavidNilosek intuited), you want to cancel the rotation and get the image as if it was acquired by the projection matrix of form K * [ I | -C ] where C=-R.inv()*T is the camera position. For that, you have to apply the following transformation:

Hr = K * R.inv() * K.inv()

The only potential problem is that the warped image might go outside the visible part of the image plane. Hence, you can use an additional translation to solve that issue, as follows:

     [ 1  0  |         ]
Ht = [ 0  1  | -K*C/Cz ]
     [ 0  0  |         ]

where Cz is the component of C along the Oz axis.

Finally, with the definitions above, H = Ht * Hr is a rectifying perspective transform for the considered image.