Seeking QGIS affine equation?

Fitting equations to data is a statistical procedure, so it's best to use statistical software for the purpose. Powerful, easy-to-use software is freely available to all at the R Project.

Because the sample data are too limited to show all the essential capabilities of R, for illustrating its use let's add a fourth link matching the "raw" (source) point (5,5) to the "gcp" (destination) point (565440.0, 6452443.9). Although R can read many file formats, with such a small number of links it's straightforward to type them in, which can be done like this with two lines of code:

source <- t(matrix(c(6.42775,3.61140,  2.7524,4.1539,  6.0293,6.4965, 5.0,5.0), nrow=2))
dest <- t(matrix(c(567742,6450622,  562285,6450858,  566709,6454802, 565440.0,6452443.9), nrow=2))

These create arrays in which each row displays a pair of coordinates. As a check, you can obtain a nicer array-like display of these data by retyping the variable names at the command prompt (>). For example:

> source
        [,1]   [,2]
[1,] 6.42775 3.6114
[2,] 2.75240 4.1539
[3,] 6.02930 6.4965
[4,] 5.00000 5.0000

There they are, one per line. Regardless, once you get the links into R in this form, finding the best (least squares) fit is incredibly simple:

lm(dest ~ source)

That's all there is to it. Here's what the output looks like:

Coefficients:
             [,1]       [,2]     
(Intercept)   558910.8  6444331.5
source1         1461.7      152.8
source2         -156.1     1469.9

This transformation matrix multiplies source coordinates on the right. An example appears below.

It's a good idea to probe the solution more closely. Let's start over, this time saving the fit:

a <- lm(dest ~ source)

Summarize the results to get more information:

> summary(a)
Response Y1 :

Call:
lm(formula = Y1 ~ source)

Residuals:
      1       2       3       4 
-0.3227 -0.4617 -0.5605  1.3449 

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  5.589e+05  4.062e+00 137586.8 4.63e-06 ***
source1      1.462e+03  5.642e-01   2590.7 0.000246 ***
source2     -1.561e+02  7.387e-01   -211.3 0.003013 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.562 on 1 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1 
F-statistic: 3.447e+06 on 2 and 1 DF,  p-value: 0.0003809 
<SNIP>

This is the result for the x coordinates only. The excised material is another panel of output headed by "Response Y2"; it is for the y coordinates. In each panel you want to check that the residual standard error (expressed in world coordinates, usually meters) is acceptably small. Then you want to check that the residuals themselves (listed at the top) are individually within your tolerances for a fit: these are the differences between the data and the predicted values. The residual standard error tells us that the typical discrepancy between a fitted x-coordinate and the intended (gcp) x-coordinate is around 1.56 meters.

As you know, relatively high numerical precision is needed in such work. This summary doesn't have sufficient precision for, say, a world file. Instead, extract the coefficients with

> print(a$coefficients, digits=10)
                      [,1]            [,2]
(Intercept) 558910.7522289 6444331.5258144
source1       1461.6779888     152.7722764
source2       -156.0974128    1469.8705760

As an example of how these coefficients work, we can ask R to perform the matrix multiplication to find where (say) the point (5,5) goes, remembering that the intercept is handled by extending these two-dimensional vectors to three dimensions by inserting a 1 at the beginning:

> print(c(1, 5,5) %*% a$coefficients, digits=10)
            [,1]       [,2]
[1,] 565438.6551 6452444.74

The first coordinate is about 1.35 low and the second one is about 0.84 high: these are the negatives of the reported residuals.

Much, much more can be done here. For example, any array of "raw" points can now be transformed in a single command; more complex fitting can be done (such as polynomial fits of higher order); etc. The power, flexibility, and accuracy of this software recommend it as a general-purpose solution to this problem of transforming points from one coordinate system to another.


Incidentally, repeating these analyses with just the three points in the example produces the affine transformation matrix

                           [,1]            [,2]
(Intercept)      558915.7606071 6444328.3973676
1                  1463.4093542     151.6907917
2                  -158.5614756    1471.4097348

R provides useful facilities for analyzing such matrices. For instance, we can ascertain that this transformation is not a Euclidean motion (that is, a composition of a uniform scaling, rotation, and translation). As an example, its Singular Value Decomposition is obtained via

m <- tail(a$coefficients, 2) # The scale-rotation-skew portion of `a`
a.svd <- svd(m)

The result has three parts: a rotation, two scale factors, and another rotation. The scale factors differ:

> a.svd$d
[1] 1480.859 1470.313

Therefore we can think of this transformation as including a rescaling factor that varies between 1470 and 1481, depending on orientation. You can work out the orientations from the SVD if you like by analyzing a.svd$v and a.svd$u, but in GIS this is rarely done: we just want to get a good fit. What the varying scale factors in the SVD tell us, though, is that your QGIS plugin may not be up to the job. Look for a solution that lets you specify the coefficients of the transformation matrix (a) directly.