Table of Contents for
Python Geospatial Development - Third Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Python Geospatial Development - Third Edition by Erik Westra Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Python Geospatial Development Third Edition
  4. Python Geospatial Development Third Edition
  5. Credits
  6. About the Author
  7. About the Reviewer
  8. www.PacktPub.com
  9. Preface
  10. What you need for this book
  11. Who this book is for
  12. Conventions
  13. Reader feedback
  14. Customer support
  15. 1. Geospatial Development Using Python
  16. Geospatial development
  17. Applications of geospatial development
  18. Recent developments
  19. Summary
  20. 2. GIS
  21. GIS data formats
  22. Working with GIS data manually
  23. Summary
  24. 3. Python Libraries for Geospatial Development
  25. Dealing with projections
  26. Analyzing and manipulating Geospatial data
  27. Visualizing geospatial data
  28. Summary
  29. 4. Sources of Geospatial Data
  30. Sources of geospatial data in raster format
  31. Sources of other types of geospatial data
  32. Choosing your geospatial data source
  33. Summary
  34. 5. Working with Geospatial Data in Python
  35. Working with geospatial data
  36. Changing datums and projections
  37. Performing geospatial calculations
  38. Converting and standardizing units of geometry and distance
  39. Exercises
  40. Summary
  41. 6. Spatial Databases
  42. Spatial indexes
  43. Introducing PostGIS
  44. Setting up a database
  45. Using PostGIS
  46. Recommended best practices
  47. Summary
  48. 7. Using Python and Mapnik to Generate Maps
  49. Creating an example map
  50. Mapnik concepts
  51. Summary
  52. 8. Working with Spatial Data
  53. Designing and building the database
  54. Downloading and importing the data
  55. Implementing the DISTAL application
  56. Using DISTAL
  57. Summary
  58. 9. Improving the DISTAL Application
  59. Dealing with the scale problem
  60. Performance
  61. Summary
  62. 10. Tools for Web-based Geospatial Development
  63. A closer look at three specific tools and techniques
  64. Summary
  65. 11. Putting It All Together – a Complete Mapping System
  66. Designing the ShapeEditor
  67. Prerequisites
  68. Setting up the database
  69. Setting up the ShapeEditor project
  70. Defining the ShapeEditor's applications
  71. Creating the shared application
  72. Defining the data models
  73. Playing with the admin system
  74. Summary
  75. 12. ShapeEditor – Importing and Exporting Shapefiles
  76. Importing shapefiles
  77. Exporting shapefiles
  78. Summary
  79. 13. ShapeEditor – Selecting and Editing Features
  80. Editing features
  81. Adding features
  82. Deleting features
  83. Deleting shapefiles
  84. Using the ShapeEditor
  85. Further improvements and enhancements
  86. Summary
  87. Index

Performing geospatial calculations

Shapely is a very capable library for performing various calculations on geospatial data. Let's put it through its paces with a complex, real-world problem.

Task – identifying parks in or near urban areas

The US Census Bureau makes available a shapefile containing something called Core Based Statistical Areas (CBSAs), which are polygons defining urban areas with a population of 10,000 or more. At the same time, the GNIS web site provides lists of place names and other details. Using these two data sources, we will identify any parks within or close to an urban area.

Because of the volume of data we are dealing with, we will limit our search to California. It would take a very long time to check all the CBSA polygon/place name combinations for the entire United States; it's possible to optimize the program to do this quickly, but this would make the example too complex for our current purposes.

Let's start by downloading the necessary data. We'll start by downloading a shapefile containing all the core-based statistical areas in the USA. Go to the TIGER web site:

http://census.gov/geo/www/tiger

Click on the TIGER/Line Shapefiles link, then follow the Download option for the latest version of the TIGER/Line shapefiles (as of this writing, it is the 2015 version). Select the FTP Interface option, and your computer will open an FTP session to the appropriate directory. Sign in as a guest, switch to the CBSA directory, and download the appropriate file you find there.

The file you want to download will have a name like tl_XXXX_us_cbsa.zip, where XXXX is the year the data file was created. Once the file has been downloaded, decompress it and place the resulting shapefile into a convenient location so that you can work with it.

You now need to download the GNIS place name data. Go to the GNIS web site, http://geonames.usgs.gov/domestic, and click on the Download Domestic Names hyperlink. Because we only need the data for California, choose this state from the Select state for download popup menu beneath the States, Territories, Associated Areas of the United States section of this page. This will download a file with a name like CA_Features_XXX.txt, where XXX is the date when the file was created. Place this file somewhere convenient, alongside the CBSA shapefile you downloaded earlier.

We're now ready to start writing our code. Let's start by reading through the CBSA urban area shapefile and extracting the polygons that define the outline of each urban area, as a Shapely geometry object:

shapefile = ogr.Open("tl_2015_us_cbsa.shp")
layer = shapefile.GetLayer(0)

for i in range(layer.GetFeatureCount()):
    feature = layer.GetFeature(i)
    geometry = feature.GetGeometryRef()
    wkt = geometry.ExportToWkt()
    outline = shapely.wkt.loads(wkt)

Next, we need to scan through the CA_Features_XXX.txt file to identify the features marked as parks. For each of these features, we want to extract the name of the feature and its associated latitude and longitude. Here's how we might do this:

f = open("CA_Features_XXX.txt", "r")
for line in f.readlines():
    chunks = line.rstrip().split("|")
    if chunks[2] == "Park":
        name = chunks[1]
        latitude = float(chunks[9])
        longitude = float(chunks[10])

Remember that the GNIS place name database is a "pipe-delimited" text file. That's why we have to split the line up using line.rstrip().split("|").

Now comes the fun part: we need to figure out which parks are within or close to each urban area. There are two ways we could do this, either of which will work:

  • We could use the outline.distance() method to calculate the distance between the outline and a Point object representing the park's location:
    Task – identifying parks in or near urban areas
  • We could dilate the polygon using the outline.buffer() method, and then see whether the resulting polygon contained the desired point:
    Task – identifying parks in or near urban areas

The second option is faster when dealing with a large number of points, as we can pre-calculate the dilated polygons and then use them to compare against each point in turn. Let's take this option:

# findNearbyParks.py

from osgeo import ogr
import shapely.geometry
import shapely.wkt

MAX_DISTANCE = 0.1 # Angular distance; approx 10 km.

print("Loading urban areas...")

urbanAreas = {} # Maps area name to Shapely polygon.

shapefile = ogr.Open("tl_2015_us_cbsa.shp")
layer = shapefile.GetLayer(0)

for i in range(layer.GetFeatureCount()):
    print("Dilating feature {} of {}"
          .format(i, layer.GetFeatureCount()))
    feature = layer.GetFeature(i)
    name = feature.GetField("NAME")
    geometry = feature.GetGeometryRef()
    outline = shapely.wkt.loads(geometry.ExportToWkt())
    dilatedOutline = outline.buffer(MAX_DISTANCE)
    urbanAreas[name] = dilatedOutline

print("Checking parks...")

f = open("CA_Features_XXX.txt", "r")
for line in f.readlines():
    chunks = line.rstrip().split("|")
    if chunks[2] == "Park":
        parkName = chunks[1]
        latitude = float(chunks[9])
        longitude = float(chunks[10])

        pt = shapely.geometry.Point(longitude, latitude)

        for urbanName,urbanArea in urbanAreas.items():
            if urbanArea.contains(pt):
                print("{} is in or near {}"
                      .format(parkName, urbanName))
f.close()

Tip

Don't forget to change the name of the CA_Features_XXX.txt file to match the actual name of the file you downloaded. You should also add a path to the tl_2015_us_cbsa.shp and CA_Features_XXX.txt file references in your program if you placed these in a different directory.

It will take a few minutes to run this program, after which you will get a complete list of all the parks that are in or close to an urban area:

% python findNearbyParks.py 
Loading urban areas...
Checking parks...
Imperial National Wildlife Refuge is in or near El Centro, CA
Imperial National Wildlife Refuge is in or near Yuma, AZ
Cibola National Wildlife Refuge is in or near El Centro, CA
Twin Lakes State Beach is in or near Santa Cruz-Watsonville, CA
Admiral William Standley State Recreation Area is in or near Ukiah, CA
...

Note

Our program takes a while to run because it includes urban areas throughout the USA, not just in California. An obvious optimization would be to only include urban areas in California, which would speed up the program significantly.

Note that our program uses angular distances to decide whether a park is in or near a given urban area. As we mentioned in Chapter 2, GIS, an angular distance is the angle between two lines going out from the center of the earth to its surface:

Task – identifying parks in or near urban areas

Because we are dealing with data for California, where one degree of angular measurement roughly equals 100 kilometers on the earth's surface, an angular measurement of 0.1 roughly equals a real-world distance of 10 km.

Using angular measurements makes the distance calculation easy and quick to calculate, though it doesn't give an exact distance on the earth's surface. If your application requires exact distances, you could start by using an angular distance to filter out the features obviously too far away and then obtain an exact result for the remaining features by calculating the point on the polygon's boundary that is closest to the desired point and calculating the linear distance between the two points. You would then discard the points that exceed your desired exact linear distance. Implementing this would be an interesting challenge, though not one we will examine in this book.