Regression Model and Spatial Autocorrelation

What these procedures are

Although OLS and GWR share many aspects of their statistical formulation, they are used for different purposes:

  • OLS formally models a global relationship of a particular sort. In its simplest form, each record (or case) in the dataset consists of a value, x, set by the experimenter (often called an "independent variable"), and another value, y, which is observed (the "dependent variable"). OLS supposes that y is approximately related to x in a particularly simple way: namely, there exist (unknown) numbers 'a' and 'b' for which a + b*x will be a good estimate of y for all values of x in which the experimenter may be interested. "Good estimate" acknowledges that the values of y can, and will, vary from any such mathematical prediction because (1) they really do--nature is rarely as simple as a mathematical equation--and (2) y is measured with some error. In addition to estimating the values of a and b, OLS also quantifies the amount of variation in y. This gives OLS the capability to establish the statistical significance of the parameters a and b.

Here is an OLS fit:

enter image description here

  • GWR is used to explore local relationships. In this setting there are still (x,y) pairs, but now (1) typically, both x and y are observed--neither can be determined beforehand by an experimenter--and (2) each record has a spatial location, z. For any location, z (not necessarily even one where data are available), GWR applies the OLS algorithm to neighboring data values to estimate a location-specific relationship between y and x in the form y = a(z) + b(z)*x. The notation "(z)" emphasizes that the coefficients a and b vary among locations. As such, GWR is a specialized version of locally weighted smoothers in which only the spatial coordinates are used to determine neighborhoods. Its output is used to suggest how values of x and y covary across a spatial region. It is noteworthy that often there is no reason to choose which of 'x' and 'y' should play the role of independent variable and dependent variable in the equation, but when you switch these roles, the results will change! This is one of the many reasons GWR should be considered exploratory--a visual and conceptual aid to understanding the data--rather than a formal method.

Here is a locally weighted smooth. Notice how it can follow the apparent "wiggles" in the data, but does not pass exactly through every point. (It can be made to pass through the points, or to follow smaller wiggles, by changing a setting in the procedure, exactly as GWR can be made to follow spatial data more or less exactly by changing settings in its procedure.)

Lowess

Intuitively, think of OLS as fitting a rigid shape (such as a line) to the scatterplot of (x,y) pairs and GWR as allowing that shape to wiggle arbitrarily.

Choosing between them

In the present case, although it is not clear what "two distinct databases" might mean, it seems that using either OLS or GWR to "validate" a relationship between them may be inappropriate. For instance, if the databases represent independent observations of the same quantity at the same set of locations, then (1) OLS is probably inappropriate because both x (the values in one database) and y (the values in the other database) should be conceived of as varying (instead of thinking of x as fixed and accurately represented) and (2) GWR is fine for exploring the relationship between x and y, but it cannot be used to validate anything: it's guaranteed to find relationships, not matter what. Moreover, as previously remarked, the symmetric roles of "two databases" indicate that either could be chosen as 'x' and the other as 'y', leading to two possible GWR results which are guaranteed to differ.

Here is a locally weighted smooth of the same data, reversing the roles of x and y. Compare this to the previous plot: notice how much steeper the overall fit is and how it differs in the details, too.

Lowess 2

Different techniques are required to establish that two databases are providing the same information, or to assess their relative bias, or relative precision. The choice of technique depends on the statistical properties of the data and the purpose of the validation. As an example, databases of chemical measurements will typically be compared using calibration techniques.

Interpreting Moran's I

It is hard to tell what a "Moran's I for the GWR model" means. I guess that a Moran's I statistic may have been computed for the residuals of a GWR calculation. (The residuals are the differences between actual and fitted values.) Moran's I is a global measure of spatial correlation. If it is small, it suggests that variations between the y-values and the GWR fits from the x-values have little or no spatial correlation. When GWR is "tuned" to the data (this involves deciding on what really constitutes a "neighbor" of any point), low spatial correlation in the residuals is to be expected because GWR (implicitly) exploits any spatial correlation among the x and y values in its algorithm.


Rsquared should not be used to compare models. Use log likihood or AIC values.

If your residuals in GWR are random, or I guess appear to be random (not statistically sig.) than you might have a specified model. It at least suggests that you don't have correlated residuals and should suggest that you do not have any omitted variables.