As in the previous recipe, we will start off by loading our data. Our data is the address records from Cuyahoga County, Ohio, USA. If you loaded this in the previous recipe, there is no need to reload the data. If you have not loaded the data yet, run the following command:
shp2pgsql -s 3734 -d -i -I -W LATIN1 -g the_geom CUY_ADDRESS_POINTS chp04.knn_addresses | psql -U me -d postgis_cookbook
As this dataset may take a while to load, you can alternatively load a subset using the following command:
shp2pgsql -s 3734 -d -i -I -W LATIN1 -g the_geom CUY_ADDRESS_POINTS_subset chp04.knn_addresses | psql -U me -d postgis_cookbook
The address points will serve as a proxy for our building structures. However, to align our structure to the nearby streets, we will need a streets layer. We will use Cuyahoga County's street centerline data for this:
shp2pgsql -s 3734 -d -i -I -W LATIN1 -g the_geom CUY_STREETS chp04.knn_streets | psql -U me -d postgis_cookbook
Before we commence, we have to consider another aspect of using indexes, which we didn't need to consider in our previous KNN recipe. When our KNN approach used only points, our indexing was exact—the bounding box of a point is effectively a point. As bounding boxes are what indexes are built around, our indexing estimates of distance perfectly reflected the actual distances between our points. In the case of non-point geometries, as is our example here, the bounding box is an approximation of the lines to which we will be comparing our points. Put another way, what this means is that our nearest neighbor may not be our very nearest neighbor, but is likely our approximate nearest neighbor, or one of our nearest neighbors.
In practice, we apply a heuristic approach: we simply gather slightly more than the number of nearest neighbors we are interested in and then sort them based on the actual distance in order to gather only the number we are interested in. In this way, we only need to sort a small number of records.