Performing geospatial calculations

Shapely is a very capable library for performing various calculations on geospatial data. Let's put it through its paces with a complex, real-world problem.

Task – identifying parks in or near urban areas

The US Census Bureau makes available a shapefile containing something called Core Based Statistical Areas (CBSAs), which are polygons defining urban areas with a population of 10,000 or more. At the same time, the GNIS web site provides lists of place names and other details. Using these two data sources, we will identify any parks within or close to an urban area.

Because of the volume of data we are dealing with, we will limit our search to California. It would take a very long time to check all the CBSA polygon/place name combinations for the entire United States; it's possible to optimize the program to do this quickly, but this would make the example too complex for our current purposes.

Let's start by downloading the necessary data. We'll start by downloading a shapefile containing all the core-based statistical areas in the USA. Go to the TIGER web site:

http://census.gov/geo/www/tiger

Click on the TIGER/Line Shapefiles link, then follow the Download option for the latest version of the TIGER/Line shapefiles (as of this writing, it is the 2015 version). Select the FTP Interface option, and your computer will open an FTP session to the appropriate directory. Sign in as a guest, switch to the CBSA directory, and download the appropriate file you find there.

The file you want to download will have a name like tl_XXXX_us_cbsa.zip, where XXXX is the year the data file was created. Once the file has been downloaded, decompress it and place the resulting shapefile into a convenient location so that you can work with it.

You now need to download the GNIS place name data. Go to the GNIS web site, http://geonames.usgs.gov/domestic, and click on the Download Domestic Names hyperlink. Because we only need the data for California, choose this state from the Select state for download popup menu beneath the States, Territories, Associated Areas of the United States section of this page. This will download a file with a name like CA_Features_XXX.txt, where XXX is the date when the file was created. Place this file somewhere convenient, alongside the CBSA shapefile you downloaded earlier.

We're now ready to start writing our code. Let's start by reading through the CBSA urban area shapefile and extracting the polygons that define the outline of each urban area, as a Shapely geometry object:

shapefile = ogr.Open("tl_2015_us_cbsa.shp")
layer = shapefile.GetLayer(0)

for i in range(layer.GetFeatureCount()):
    feature = layer.GetFeature(i)
    geometry = feature.GetGeometryRef()
    wkt = geometry.ExportToWkt()
    outline = shapely.wkt.loads(wkt)

Next, we need to scan through the CA_Features_XXX.txt file to identify the features marked as parks. For each of these features, we want to extract the name of the feature and its associated latitude and longitude. Here's how we might do this:

f = open("CA_Features_XXX.txt", "r")
for line in f.readlines():
    chunks = line.rstrip().split("|")
    if chunks[2] == "Park":
        name = chunks[1]
        latitude = float(chunks[9])
        longitude = float(chunks[10])

Remember that the GNIS place name database is a "pipe-delimited" text file. That's why we have to split the line up using line.rstrip().split("|").

Now comes the fun part: we need to figure out which parks are within or close to each urban area. There are two ways we could do this, either of which will work:

We could use the outline.distance() method to calculate the distance between the outline and a Point object representing the park's location:
We could dilate the polygon using the outline.buffer() method, and then see whether the resulting polygon contained the desired point:

The second option is faster when dealing with a large number of points, as we can pre-calculate the dilated polygons and then use them to compare against each point in turn. Let's take this option:

# findNearbyParks.py

from osgeo import ogr
import shapely.geometry
import shapely.wkt

MAX_DISTANCE = 0.1 # Angular distance; approx 10 km.

print("Loading urban areas...")

urbanAreas = {} # Maps area name to Shapely polygon.

shapefile = ogr.Open("tl_2015_us_cbsa.shp")
layer = shapefile.GetLayer(0)

for i in range(layer.GetFeatureCount()):
    print("Dilating feature {} of {}"
          .format(i, layer.GetFeatureCount()))
    feature = layer.GetFeature(i)
    name = feature.GetField("NAME")
    geometry = feature.GetGeometryRef()
    outline = shapely.wkt.loads(geometry.ExportToWkt())
    dilatedOutline = outline.buffer(MAX_DISTANCE)
    urbanAreas[name] = dilatedOutline

print("Checking parks...")

f = open("CA_Features_XXX.txt", "r")
for line in f.readlines():
    chunks = line.rstrip().split("|")
    if chunks[2] == "Park":
        parkName = chunks[1]
        latitude = float(chunks[9])
        longitude = float(chunks[10])

        pt = shapely.geometry.Point(longitude, latitude)

        for urbanName,urbanArea in urbanAreas.items():
            if urbanArea.contains(pt):
                print("{} is in or near {}"
                      .format(parkName, urbanName))
f.close()

Tip

Don't forget to change the name of the CA_Features_XXX.txt file to match the actual name of the file you downloaded. You should also add a path to the tl_2015_us_cbsa.shp and CA_Features_XXX.txt file references in your program if you placed these in a different directory.

It will take a few minutes to run this program, after which you will get a complete list of all the parks that are in or close to an urban area:

% python findNearbyParks.py 
Loading urban areas...
Checking parks...
Imperial National Wildlife Refuge is in or near El Centro, CA
Imperial National Wildlife Refuge is in or near Yuma, AZ
Cibola National Wildlife Refuge is in or near El Centro, CA
Twin Lakes State Beach is in or near Santa Cruz-Watsonville, CA
Admiral William Standley State Recreation Area is in or near Ukiah, CA
...

Note

Our program takes a while to run because it includes urban areas throughout the USA, not just in California. An obvious optimization would be to only include urban areas in California, which would speed up the program significantly.

Note that our program uses angular distances to decide whether a park is in or near a given urban area. As we mentioned in Chapter 2, GIS, an angular distance is the angle between two lines going out from the center of the earth to its surface:

Task – identifying parks in or near urban areas

Because we are dealing with data for California, where one degree of angular measurement roughly equals 100 kilometers on the earth's surface, an angular measurement of 0.1 roughly equals a real-world distance of 10 km.

Using angular measurements makes the distance calculation easy and quick to calculate, though it doesn't give an exact distance on the earth's surface. If your application requires exact distances, you could start by using an angular distance to filter out the features obviously too far away and then obtain an exact result for the remaining features by calculating the point on the polygon's boundary that is closest to the desired point and calculating the linear distance between the two points. You would then discard the points that exceed your desired exact linear distance. Implementing this would be an interesting challenge, though not one we will examine in this book.

Previous Chapter

Changing datums and projections

Next Chapter

Converting and standardizing units of geometry and distance

Table of Contents for Python Geospatial Development - Third Edition

Performing geospatial calculations

Task – identifying parks in or near urban areas

Tip

Note

Table of Contents for
Python Geospatial Development - Third Edition