meaning of m34 of CATransform3D

You can find the full details here. Note that Apple uses the reversed multiplication order for projection (relative to the given link) so all matrix multiplications are reversed and all matrices are transposed.

A short description of the meaning:

  • m34 = 1/z distance to projection plane (the 1/ez term in the reference link)
  • + for the z axis is towards the viewer, resulting in a "looking in the mirror" feel when using -
  • projection center is (0,0,0) plus any translations you set up

I read some articles including this one: https://developer.apple.com/library/content/documentation/Cocoa/Conceptual/CoreAnimation_guide/AdvancedAnimationTricks/AdvancedAnimationTricks.html#//apple_ref/doc/uid/TP40004514-CH8-SW13

My solutions is here:

Entities:

  • eye - distance from screen to eye
  • scale - visual scale of transformed object
  • distance - distance to transformed object

Connecting formulas:

  • scale = eye / (eye + distance)
  • distance = eye * (1.0/scale - scale)
  • eye = distance / (1.0/scale - scale)

Example of computing z-distance for desized scale of selected eye distance:

CATransform3D transformByScaleAndEye(CGFloat scale, CGFloat eye) {
    CATransform3D t = CATransform3DIdentity;
    t.m34 = -1.0 / eye;
    CGFloat distance = -eye*(1.0/scale - scale);
    return CATransform3DTranslate(t, 0, 0, distance);
}

The following is some background knowledge about the topics which I think readers should know before answering the questions:

  1. iOS coordinate system: Imagine you are holding your phone vertically with the screen facing you. For each view, its coordinate system has origin at its center. x-axis from left to right, y-axis from top to bottom, z-axis from back of the phone to you face.

  2. Homogenouse coordinates: when doing 3D transformation with iOS, you are working with homogenous coordinates or projective coordinates instead of traditional Cartesian coordinates. In short, the new coordinate system use one more dimension w compared to the old one. The beauty of this system is that it allow doing rotation/translation/scale by doing vector-matrix-multiplication.

  • To convert a vector in homogenous coordinate to Catersian coordinate, you divide x, y, z to w.

Now, let's get to the answer. Consider an example as following:

    var transform = CATransform3DIdentity
    transform.m34 = -1 / 500
    transform = CATransform3DRotate(transform, Double.pi/4, 0, 1, 0)
    transform = CATransform3DTranslate(transform, 0, 0, 200)
    imageLayer.transform = transform

To know what the code above does, you must read it in reversed order. Firstly, the image is moved 200px in z-axis toward (positive sign mean its direction is from the screen toward your face). Secondly, the image is rotate 45 degree relative to the y-axis (tilted to your right). If you stop here, you'll have a image with smaller width and shifted to your right. But some operations is done at step 3 and "magically" give the image perspective. Here lies the mystery of m34 element.

Here is the sequence of operations expressed in term of matrix multiplications:

 [x' y' z' w'] = ([x y z w] * translation_matrix * rotation_matrix) * perspective_matrix

Transformation with perspective matrix

Now, focus on the operation with the perspective_matrix

Convert to Catersian coordinates:

Transformation with scale factor

To model 3D perspective on 2D screen, you want to objects closer to your eye appear bigger than objects further away. To do that, you project objects from 3D space to the screen, i.e, draw imaginary lines from your eyes passthrough the objects and intercept with the screen. Let's calculate scaling factor for that projections:

z = the object's z-coordinate which is its distance to the screen
eye2screen #distance from your eyes to the screens
scaleFactor = eye2screen / (eye2Screen - z) #Thales's intercept theorem

A quick calculation can confirm our intuition. As z get smaller (i.e objects further away from the eyes), scaleFactor get smaller.

Reconcile our scaleFactor with perspective_matrix multiplication above we have:

scaleFactor = 1 / (1 + m34 * z)
1 / (1 - z / eye2screen) = 1 / (1 + m34 * z)
-1 / eye2screen = m34
m34 = -1 / eye2screen

Pick a reasonable distance between eyes and screen (like 500), you can calculate m34 and you've got yourself a new perspective_matrix.

Conclusions:

  1. Perspective matrix is a math trick to scale objects base on their distance to your eyes.
  2. Compute m34 = -1 / eye2screen give you a perspective_matrix
  3. perspective_matrix operation must be applied only once at the end of your transform pipeline.