Table of Contents for
Python Geospatial Development - Third Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Python Geospatial Development - Third Edition by Erik Westra Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Python Geospatial Development Third Edition
  4. Python Geospatial Development Third Edition
  5. Credits
  6. About the Author
  7. About the Reviewer
  8. www.PacktPub.com
  9. Preface
  10. What you need for this book
  11. Who this book is for
  12. Conventions
  13. Reader feedback
  14. Customer support
  15. 1. Geospatial Development Using Python
  16. Geospatial development
  17. Applications of geospatial development
  18. Recent developments
  19. Summary
  20. 2. GIS
  21. GIS data formats
  22. Working with GIS data manually
  23. Summary
  24. 3. Python Libraries for Geospatial Development
  25. Dealing with projections
  26. Analyzing and manipulating Geospatial data
  27. Visualizing geospatial data
  28. Summary
  29. 4. Sources of Geospatial Data
  30. Sources of geospatial data in raster format
  31. Sources of other types of geospatial data
  32. Choosing your geospatial data source
  33. Summary
  34. 5. Working with Geospatial Data in Python
  35. Working with geospatial data
  36. Changing datums and projections
  37. Performing geospatial calculations
  38. Converting and standardizing units of geometry and distance
  39. Exercises
  40. Summary
  41. 6. Spatial Databases
  42. Spatial indexes
  43. Introducing PostGIS
  44. Setting up a database
  45. Using PostGIS
  46. Recommended best practices
  47. Summary
  48. 7. Using Python and Mapnik to Generate Maps
  49. Creating an example map
  50. Mapnik concepts
  51. Summary
  52. 8. Working with Spatial Data
  53. Designing and building the database
  54. Downloading and importing the data
  55. Implementing the DISTAL application
  56. Using DISTAL
  57. Summary
  58. 9. Improving the DISTAL Application
  59. Dealing with the scale problem
  60. Performance
  61. Summary
  62. 10. Tools for Web-based Geospatial Development
  63. A closer look at three specific tools and techniques
  64. Summary
  65. 11. Putting It All Together – a Complete Mapping System
  66. Designing the ShapeEditor
  67. Prerequisites
  68. Setting up the database
  69. Setting up the ShapeEditor project
  70. Defining the ShapeEditor's applications
  71. Creating the shared application
  72. Defining the data models
  73. Playing with the admin system
  74. Summary
  75. 12. ShapeEditor – Importing and Exporting Shapefiles
  76. Importing shapefiles
  77. Exporting shapefiles
  78. Summary
  79. 13. ShapeEditor – Selecting and Editing Features
  80. Editing features
  81. Adding features
  82. Deleting features
  83. Deleting shapefiles
  84. Using the ShapeEditor
  85. Further improvements and enhancements
  86. Summary
  87. Index

Working with GIS data manually

Let's take a brief look at the process of working with GIS data manually. Before we can begin, there are two things you need to do:

  • Obtain some GIS data
  • Install the GDAL library and its Python bindings so that you can read the necessary data files

Obtaining the data

Let's use the US Census Bureau's web site to download a set of vector maps for the various US states. The main site for obtaining GIS data from the US Census Bureau can be found at http://www.census.gov/geo/www/tiger.

To make things simpler, though, let's bypass the web site and directly download the file we need from http://www2.census.gov/geo/tiger/TIGER2014/STATE/tl_2014_us_state.zip.

The file, tl_2014_us_state.zip, should be a ZIP-format archive. After uncompressing the archive, you should have a directory containing the following files:

  • tl_2014_us_state.dbf
  • tl_2014_us_state.prj
  • tl_2014_us_state.shp
  • tl_2014_us_state.shx

These files make up a shapefile containing the outlines of all the US states. Place these files together in convenient directory.

Note

There will be a few extra files with a suffix of .xml. These contain additional information about the downloaded data but are not part of the shapefile itself.

Installing GDAL

We next want to download and install the GDAL Python library. The main web site for GDAL can be found at http://gdal.org.

How you install GDAL depends on which operating system you are running.

Installing GDAL on Linux

To install GDAL on a Linux machine, you would typically use your computer's package manager to install the GDAL package itself. For example, on Fedora Linux, you would use the following command:

yum install gdal gdal-devel

Note that you need to install the gdal-devel package as well as gdal itself. The gdal-devel package includes the files needed to compile the Python bindings for GDAL.

Once you have installed GDAL itself, you'll need to install and build the Python bindings. These Python bindings can be found at https://pypi.python.org/pypi/GDAL. Typically, you would use a command such as the following:

sudo easy_install GDAL

Obviously, you will need to have Python installed before you can install the Python bindings for GDAL.

Installing GDAL on Mac OS X

To install GDAL on a Mac, you need to install the GDAL library itself, and then compile the Python bindings to use the version of GDAL you installed. Let's work through the various steps one at a time:

  1. To download GDAL, go to http://www.kyngchaos.com/software/frameworks. You will need to install the GDAL Complete package. This package includes a precompiled version of GDAL, which you should install onto your computer.
  2. Once you have installed GDAL, you next need to compile the Python bindings. To do this, you first need to install Xcode, which is Apple's development environment. Xcode can be downloaded for free from the Mac App store.

    Note

    Note that if your computer runs a version of Mac OS X lower than 10.9 (Yosemite), you will need to separately install the command-line tools for Xcode. To do this, start up Xcode and choose the Preferences… command from the Xcode menu. In the Downloads tab will be an option to install the command-line tools; enable this option and wait for the required tools to be installed.

  3. Once you have Xcode installed, you can download and compile the Python bindings for GDAL. The easiest way to do this is to use pip, the Python package manager. If you don't already have pip, you can install it by following the instructions at https://pip.pypa.io/en/latest/installing.html.
  4. Before you can compile the GDAL Python bindings, you will need to set a few environment variables so that Python can find the version of GDAL you installed. To do this, type the following in a Terminal window:
    export CPLUS_INCLUDE_PATH=/Library/Frameworks/GDAL.framework/Headers
    export C_INCLUDE_PATH=/Library/Frameworks/GDAL.framework/Headers
    export LIBRARY_PATH=/Library/Frameworks/GDAL.framework/Versions/1.11/unix/lib
    

    Make sure the version of GDAL in the LIBRARY_PATH variable definition matches the version you installed.

  5. You can now install the Python bindings by typing the following:
    pip install gdal==1.11
    

    Once again, make sure that the version number matches the version of GDAL you installed.

All going well, the Python bindings for GDAL should compile successfully. You will see a few compiler warnings, but hopefully no errors.

Installing GDAL on MS Windows

Unfortunately, there is currently no prebuilt installer for GDAL that uses Python 3. You can either attempt to build it yourself, or you will have to use Python 2. You can download a binary installer for GDAL that includes Python 2 support from http://www.gisinternals.com.

Because you won't be able to use Python 3, you will need to adjust the code you write to match the syntax for Python 2. For the code samples for this chapter, the only difference is that you don't use parentheses after the print keyword.

Testing your GDAL installation

After installing GDAL, you can check whether it works by typing import osgeo into the Python command line; if the Python command prompt reappears with no error message, it means GDAL was successfully installed, and you are all set to go:

>>> import osgeo
>>>

Examining the Downloaded Shapefile

Let's now use GDAL to look at the contents of the shapefile you downloaded earlier. You can either type the following directly into the command prompt, or else save it as a Python script so that you can run it whenever you wish to (let's call this script analyze.py):

import osgeo.ogr

shapefile = osgeo.ogr.Open("tl_2014_us_state.shp")
numLayers = shapefile.GetLayerCount()

print("Shapefile contains {} layers".format(numLayers))
print()

for layerNum in range(numLayers):
  layer = shapefile.GetLayer(layerNum)
  spatialRef = layer.GetSpatialRef().ExportToProj4()
  numFeatures = layer.GetFeatureCount()
  print("Layer {} has spatial reference {}".format(
        layerNum, spatialRef))
  print("Layer {} has {} features:".format(
        layerNum, numFeatures))
  print()

  for featureNum in range(numFeatures):
    feature = layer.GetFeature(featureNum)
    featureName = feature.GetField("NAME")

    print("Feature {} has name {}".
          format(featureNum, featureName))

Tip

This example assumes you've placed this script in the same directory as the tl_2014_us_state.shp file. If you've put it in a different directory, change the osgeo.ogr.Open() command to include the path to your shapefile. If you are running this on MS Windows, don't forget to use double backslash characters (\\) as directory separators.

Running this script will give us a quick summary of how the shapefile's data is structured:

Shapefile contains 1 layers

Layer 0 has spatial reference +proj=longlat +datum=NAD83 +no_defs
Layer 0 has 56 features:

Feature 0 has name West Virginia
Feature 1 has name Florida
Feature 2 has name Illinois
Feature 3 has name Minnesota
...
Feature 53 has name District of Columbia
Feature 54 has name Iowa
Feature 55 has name Arizona

This shows us that the data we downloaded consists of one layer, with 56 individual features corresponding to the various states and protectorates in the USA. It also tells us the "spatial reference" for this layer, which tells us that the coordinates are represented as latitude and longitude values using the NAD83 datum.

As you can see from the example, using GDAL to extract data from shapefiles is quite straightforward. Let's continue with another example. This time, we'll look at the details of feature number 12, New Mexico. Let's call this program analyze2.py:

import osgeo.ogr

shapefile = osgeo.ogr.Open("tl_2014_us_state.shp")
layer = shapefile.GetLayer(0)
feature = layer.GetFeature(12)

print("Feature 12 has the following attributes:")
print()

attributes = feature.items()

for key,value in attributes.items():
  print("  {} = {}".format(key, value))

geometry = feature.GetGeometryRef()
geometryName = geometry.GetGeometryName()

print()
print("Feature's geometry data consists of a {}".format(
      geometryName))

Running this produces the following output:

Feature 12 has the following attributes:

  STUSPS = NM
  GEOID = 35
  REGION = 4
  ALAND = 314161410324.0
  MTFCC = G4000
  FUNCSTAT = A
  NAME = New Mexico
  STATEFP = 35
  LSAD = 00
  STATENS = 00897535
  AWATER = 755712514.0
  DIVISION = 8
  INTPTLON = -106.1316181
  INTPTLAT = +34.4346843

Feature's geometry data consists of a POLYGON

The meaning of the various attributes is described on the US Census Bureau's web site, but what interests us right now is the feature's geometry. A geometry object is a complex structure that holds some geospatial data, often using nested geometry objects to reflect the way the geospatial data is organized. So far, we've discovered that New Mexico's geometry consists of a polygon. Let's now take a closer look at this polygon, using a program we'll call analyze3.py:

import osgeo.ogr

def analyzeGeometry(geometry, indent=0):
  s = []
  s.append("  " * indent)
  s.append(geometry.GetGeometryName())
  if geometry.GetPointCount() > 0:
    s.append(" with {} data points".format(geometry.GetPointCount()))
  if geometry.GetGeometryCount() > 0:
    s.append(" containing:")

  print("".join(s))

  for i in range(geometry.GetGeometryCount()):
    analyzeGeometry(geometry.GetGeometryRef(i), indent+1)

shapefile = osgeo.ogr.Open("tl_2014_us_state.shp")
layer = shapefile.GetLayer(0)
feature = layer.GetFeature(12)
geometry = feature.GetGeometryRef()

analyzeGeometry(geometry)

The recursive analyzeGeometry() function reveals the structure for this geometry:

POLYGON containing:
  LINEARRING with 7554 data points

As you can see, this geometry has a single linear ring, defining the outline of New Mexico. If we ran the same program over California (feature 13 in our shapefile), the output would be somewhat more complicated:

MULTIPOLYGON containing:
  POLYGON containing:
    LINEARRING with 121 data points
  POLYGON containing:
    LINEARRING with 191 data points
  POLYGON containing:
    LINEARRING with 77 data points
  POLYGON containing:
    LINEARRING with 152 data points
  POLYGON containing:
    LINEARRING with 392 data points
  POLYGON containing:
    LINEARRING with 93 data points
  POLYGON containing:
    LINEARRING with 10105 data points

As you can see, California is made up of seven distinct polygons, each defined by a single linear ring. This is because California is on the coast and includes six outlying islands as well as the main inland body of the state.

Let's finish this analysis of the US state shapefile by answering a simple question: what is the distance from the northernmost point to the southernmost point in California? There are various ways we could answer this question, but for now, we'll do it by hand. Let's start by identifying the northernmost and southernmost points in California:

import osgeo.ogr

def findPoints(geometry, results):
  for i in range(geometry.GetPointCount()):
    x,y,z = geometry.GetPoint(i)
    if results['north'] == None or results['north'][1] < y:
      results['north'] = (x,y)
    if results['south'] == None or results['south'][1] > y:
      results['south'] = (x,y)

  for i in range(geometry.GetGeometryCount()):
    findPoints(geometry.GetGeometryRef(i), results)

shapefile = osgeo.ogr.Open("tl_2014_us_state.shp")
layer = shapefile.GetLayer(0)
feature = layer.GetFeature(13)
geometry = feature.GetGeometryRef()

results = {'north' : None,
           'south' : None}

findPoints(geometry, results)

print("Northernmost point is ({:.4f}, {:.4f})".format(
      results['north'][0], results['north'][1]))
print("Southernmost point is ({:.4f}, {:.4f})".format(
      results['south'][0], results['south'][1]))

The findPoints() function recursively scans through a geometry, extracting the individual points and identifying the points with the highest and lowest y (latitude) values, which are then stored in the results dictionary so that the main program can use these values.

As you can see, GDAL makes it easy to work with the complex geometry data structure. The code does require recursion, but it is still trivial compared with trying to read the data directly. If you run the above program, the following output will be displayed:

Northernmost point is (-122.3782, 42.0095)
Southernmost point is (-117.2049, 32.5288)

Now that we have these two points, we want to calculate the distance between them. As described earlier, we have to use a great-circle distance calculation here to allow for the curvature of the Earth's surface. We'll do this manually, using the haversine formula:

import math

lat1 = 42.0095
long1 = -122.3782

lat2 = 32.5288
long2 = -117.2049

rLat1 = math.radians(lat1)
rLong1 = math.radians(long1)
rLat2 = math.radians(lat2)
rLong2 = math.radians(long2)

dLat = rLat2 - rLat1
dLong = rLong2 - rLong1
a = math.sin(dLat/2)**2 + math.cos(rLat1) * math.cos(rLat2) \
                        * math.sin(dLong/2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
distance = 6371 * c

print("Great circle distance is {:0.0f} kilometers".format(
      distance))

Don't worry about the complex maths involved here; basically, we are converting the latitude and longitude values to radians, calculating the difference in latitude/longitude values between the two points, and then passing the results through some trigonometric functions to obtain the great-circle distance. The value of 6371 is the radius of the Earth, in kilometers.

More details about the haversine formula can be found at http://mathforum.org/library/drmath/view/51879.html.

If you run the program, your computer will tell you the distance from the northernmost to the southernmost point in California:

Great circle distance is 1149 kilometers

There are, of course, other ways of calculating this. You wouldn't normally type the haversine formula directly into your program, as there are libraries that will do this for you. But we deliberately did the calculation this way to show how it can be done.

If you would like to explore this further, you might like to try writing programs to calculate the following:

  • The easternmost and westernmost points in California
  • The midpoint of California

    Tip

    Hint: You can calculate the midpoint's longitude by taking the average of the easternmost and westernmost longitudes.

  • The midpoint of Arizona
  • The distance between the middle of California and the middle of Arizona

As you can see, working with GIS data manually isn't too onerous. While the data structures and maths involved can be rather complex, and you do have to be aware of the underlying GIS concepts, using tools such as GDAL makes your data accessible and easy to work with.