Obtaining same area values provided by Census for TIGER boundaries using PostGIS functions?

Very interesting question! I'd note that the geography area calculation is extremely close (in your last example, 3 msq difference over an area of 20000000 msq. (0.000015%)

Since we know that census manages its data inside Oracle Spatial, I'd guess that what you're seeing is a very small difference in the implementation of geodetic area calculation between Oracle and PostGIS. Certainly the very very very similar results would lead me to believe that to be the case.

The way to test would be to take one of your larger polygons and import it into Oracle and compare the geodetic area calculation to the PostGIS one. If you see the same difference, then mystery solved. You can do this if you're motivated (and judging from the work you've put in so far, perhaps you are!) by downloading the free Oracle Express edition and running the calculation in there.


You might like to compare your results for the area with those produced by Planimeter utility of GeographicLib. This accurately computes the area of a geodesic polygon on the reference ellipsoid. This does a direct area calculation (without an intermediate equal-area projection). It is typically accurate to better that 0.01 m^2 even for large "country sized" polygons. Note however the caveats: (1) the edges of the polygon are geodesics; (2) the area is computed on the reference ellipsoid. Here's an online version of Planimeter (this is restricted to the WGS84 ellipsoid).


I pinged a friend who works for the US Census, and he told me that Oracle's area calculation entails transforming the vertices of a polygon to the authalic sphere and using spherical trigonometry to compute the area.

There's an intrinsic error in this approach, namely the geodesic between two vertices is approximated by the great circle between the corresponding points on the authalic sphere. This will be small if the edges of the polygon are sufficiently short.

It's also possible that Oracle uses L'Huilier's formula for the spherical excess (i.e., the area) of a spherical triangle. This can be very badly conditioned because the formula is in terms of the edges of the triangle. This will lead to large round-off errors for polygons with nearly co-linear edges. Much better is to use one of Delambre's identities, i.e., the last formula given this Wikipedia article.

In summary: Oracle's area formula approximates the polygon edges (and may entail badly conditioned equations). PostGIS's area formula will involve quantization errors from doing a numerical integration (with the resulting accumulated round off error). GeographicLib avoids these sources of error.

ADDENDUM: To quantify the error in computing areas using the authalic sphere, consider the following triangle (lat, lon)

40.35 -74.67
40.37 -74.62
40.40 -74.64

Using the WGS84 ellipsoid, the edges of this triangle are about 4.8 km, 3.7 km and 6.1 km. The geodesic area according to Planimeter is 8960019.1581 m2. If the triangle is transferred to the authalic sphere (radius = 6371007.180918473898 m), the corners become

40.223428173542305404 -74.67
40.243413568344088227 -74.62
40.273391776123087215 -74.64

and the corresponding area (again according to Planimeter) is 8960018.2308 m2; so the error is about 0.9 m2 or 1 part in 107. I expect that the absolute error will scale as the (max-edge)3.