Table of Contents for
Python Geospatial Development - Third Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Python Geospatial Development - Third Edition by Erik Westra Published by Packt Publishing, 2016
  1. Cover
  2. Table of Contents
  3. Python Geospatial Development Third Edition
  4. Python Geospatial Development Third Edition
  5. Credits
  6. About the Author
  7. About the Reviewer
  8. www.PacktPub.com
  9. Preface
  10. What you need for this book
  11. Who this book is for
  12. Conventions
  13. Reader feedback
  14. Customer support
  15. 1. Geospatial Development Using Python
  16. Geospatial development
  17. Applications of geospatial development
  18. Recent developments
  19. Summary
  20. 2. GIS
  21. GIS data formats
  22. Working with GIS data manually
  23. Summary
  24. 3. Python Libraries for Geospatial Development
  25. Dealing with projections
  26. Analyzing and manipulating Geospatial data
  27. Visualizing geospatial data
  28. Summary
  29. 4. Sources of Geospatial Data
  30. Sources of geospatial data in raster format
  31. Sources of other types of geospatial data
  32. Choosing your geospatial data source
  33. Summary
  34. 5. Working with Geospatial Data in Python
  35. Working with geospatial data
  36. Changing datums and projections
  37. Performing geospatial calculations
  38. Converting and standardizing units of geometry and distance
  39. Exercises
  40. Summary
  41. 6. Spatial Databases
  42. Spatial indexes
  43. Introducing PostGIS
  44. Setting up a database
  45. Using PostGIS
  46. Recommended best practices
  47. Summary
  48. 7. Using Python and Mapnik to Generate Maps
  49. Creating an example map
  50. Mapnik concepts
  51. Summary
  52. 8. Working with Spatial Data
  53. Designing and building the database
  54. Downloading and importing the data
  55. Implementing the DISTAL application
  56. Using DISTAL
  57. Summary
  58. 9. Improving the DISTAL Application
  59. Dealing with the scale problem
  60. Performance
  61. Summary
  62. 10. Tools for Web-based Geospatial Development
  63. A closer look at three specific tools and techniques
  64. Summary
  65. 11. Putting It All Together – a Complete Mapping System
  66. Designing the ShapeEditor
  67. Prerequisites
  68. Setting up the database
  69. Setting up the ShapeEditor project
  70. Defining the ShapeEditor's applications
  71. Creating the shared application
  72. Defining the data models
  73. Playing with the admin system
  74. Summary
  75. 12. ShapeEditor – Importing and Exporting Shapefiles
  76. Importing shapefiles
  77. Exporting shapefiles
  78. Summary
  79. 13. ShapeEditor – Selecting and Editing Features
  80. Editing features
  81. Adding features
  82. Deleting features
  83. Deleting shapefiles
  84. Using the ShapeEditor
  85. Further improvements and enhancements
  86. Summary
  87. Index

Downloading and importing the data

As described in the previous section, the DISTAL application will make use of four separate sets of freely available geospatial data:

  • The World Borders Dataset
  • The high-resolution GSHHG shoreline database
  • The GNIS Database of US place names
  • The GEONet Names Server's list of non-US place names

Note

For more information on these sources of data, refer to Chapter 4, Sources of Geospatial Data.

Let's work through the process of downloading and importing each of these datasets in turn.

The World Borders Dataset

You downloaded a copy of this dataset earlier. Create a subdirectory named data within your DISTAL directory, and place a copy of the TM_WORLD_BORDERS-0.3 directory into the data directory. Then, create a new Python program named import_world_borders.py within your DISTAL directory, and enter the following into it:

import os.path
import psycopg2
import osgeo.ogr

connection = psycopg2.connect(database="distal",
                              user="distal_user",
                              password="...")
cursor = connection.cursor()

cursor.execute("DELETE FROM countries")

srcFile = os.path.join("data", "TM_WORLD_BORDERS-0.3",
                       "TM_WORLD_BORDERS-0.3.shp")
shapefile = osgeo.ogr.Open(srcFile)
layer = shapefile.GetLayer(0)
num_done = 0

for i in range(layer.GetFeatureCount()):
    feature = layer.GetFeature(i)
    name    = feature.GetField("NAME")
    wkt     = feature.GetGeometryRef().ExportToWkt()

    cursor.execute("INSERT INTO countries (name,outline) " +
                   "VALUES (%s, ST_GeometryFromText(%s, 4326))",
                   (name, wkt))

    num_done = num_done + 1

connection.commit()

print("Imported {} countries".format(num_done))

This program should be pretty easy to understand; we're simply making use of OGR to load the contents of the shapefile and psycopg2 to copy the data into the countries table. When you run this program, the contents of the World Borders Dataset should be imported into your database.

The GSHHG shoreline database

Download a copy of the GSHHG shoreline database (http://www.ngdc.noaa.gov/mgg/shorelines/gshhs.html) if you haven't already done so. Make sure you download the full database in Shapefile format. Place the resulting GSHHS_shp directory into your data subdirectory.

Tip

Remember that the GSHHG database used to be called GSHHS, which is why the GSHHS abbreviation still appears in a few places, such as the name of this shapefile.

The GSHHG shoreline database consists of separate shapefiles defining land/water boundaries at five different resolutions. For the DISTAL application, we want to import four of these levels (coastline, lake, island-in-lake, and pond-in-island-in-lake) at full resolution. This shapefile is stored in a subdirectory named f, which stands for full resolution.

To import this data into the DISTAL database, create a new program named import_gshhg.py within your DISTAL directory, and enter the following Python code into this file:

import os.path
import psycopg2
import osgeo.ogr

connection = psycopg2.connect(database="distal",
                              user="distal_user",
                              password="...")
cursor = connection.cursor()

cursor.execute("DELETE FROM shorelines")

for level in [1, 2, 3, 4]:
    num_done = 0

    srcFile = os.path.join("data", "GSHHS_shp", "f",
                           "GSHHS_f_L{}.shp".format(level))
    shapefile = osgeo.ogr.Open(srcFile)
    layer = shapefile.GetLayer(0)

    for i in range(layer.GetFeatureCount()):
        feature = layer.GetFeature(i)
        geometry = feature.GetGeometryRef()
        if geometry.IsValid():
            cursor.execute("INSERT INTO shorelines " +
                           "(level,outline) VALUES " +
                           "(%s, ST_GeometryFromText(%s, 4326))",
                           (level, geometry.ExportToWkt()))

        num_done = num_done + 1

    connection.commit()

    print("Imported {} shorelines at level {}".format(num_done, level))

Tip

Don't forget to enter the password you set up for your distal_user user in the call to psycopg2.connect().

Notice that we have to check that the geometry is valid before inserting it into the database. This is because of some problems with a few of the geometries in the GSHHG database. It is theoretically possible to fix these invalid records, but doing so is quite complicated, so we're simply skipping them.

When you run this program, it will take a few minutes to complete as there are over 180,000 high-resolution polygons to be imported.

US place names

You can download the list of US place names from http://geonames.usgs.gov/domestic. Click on the Download Domestic Names hyperlink, and choose the Download all national features in one .zip file option. This will download a file named NationalFile_YYYYMMDD.zip, where YYYYMMDD is the datestamp identifying when the file was last updated. Once again, decompress the resulting .zip archive and move the NationalFile_YYYYMMDD.txt file into your data subdirectory.

The file you have just downloaded is a pipe-delimited text file, meaning that each column is separated by a | character, like this:

FEATURE_ID|FEATURE_NAME|FEATURE_CLASS|...|DATE_EDITED
399|Agua Sal Creek|Stream|AZ|...|02/08/1980
400|Agua Sal Wash|Valley|AZ|...|02/08/1980

The first line contains the names of the various fields. While there are a lot of fields in the file, there are four fields that we are particularly interested in:

  • The FEATURE_NAME field contains the name of the location, in UTF-8 character encoding.
  • The FEATURE_CLASS field tells us what type of feature we are dealing with, in this case, a Stream or a Valley. There are a lot of features we don't need for the DISTAL application, for example, the names of bays, beaches, bridges, and oilfields. In fact, there is only one feature class we are interested in: Populated Place.
  • The PRIM_LONG_DEC and PRIM_LAT_DEC fields contain the longitude and latitude of the location, in decimal degrees. According to the documentation, these coordinates use the NAD83 datum rather than the WGS84 datum used by the other data we are importing.

We are going to import the data from this file into our places database table. Because the latitude and longitude values use the NAD83 datum, we will use pyproj to convert the coordinates into WGS84 so that it is compatible with the other data we are using.

Tip

Strictly speaking, this step isn't necessary. We will be using pyproj to transform coordinates from NAD83 to WGS84. However, the data we are importing is all within the United States, and these two datums happen to be identical for points within the United States. Because of this, pyproj won't actually change the coordinates at all. But we will do this anyway, following the recommended practice of knowing the spatial reference for our data and transforming when necessary—even if that transformation is a no-op at times.

Let's write the program to import this data into the database. Create a new file named import_gnis.py in your main DISTAL directory, and enter the following Python code into this file:

import psycopg2
import os.path
import pyproj

connection = psycopg2.connect(database="distal",
                              user="distal_user",
                              password="...")
cursor = connection.cursor()

cursor.execute("DELETE FROM places")

srcProj = pyproj.Proj(proj='longlat', ellps='GRS80',
                      datum='NAD83')
dstProj = pyproj.Proj(proj='longlat', ellps='WGS84',
                      datum='WGS84')

f = file(os.path.join("data",
                      "NationalFile_YYYYMMDD.txt"), "r")
heading = f.readline() # Ignore field names.

num_inserted = 0

for line in f:
    parts = line.rstrip().split("|")
    featureName = parts[1]
    featureClass = parts[2]
    lat = float(parts[9])
    long = float(parts[10])

    if featureClass == "Populated Place":
        long,lat = pyproj.transform(srcProj, dstProj,
                                    long, lat)
        cursor.execute("INSERT INTO places " +
                       "(name, position) VALUES (%s, " +
                       "ST_SetSRID(" +
                       "ST_MakePoint(%s,%s), 4326))",
                       (featureName, long, lat))

        num_inserted += 1
        if num_inserted % 1000 == 0:
            connection.commit()

connection.commit()
f.close()

print("Inserted {} placenames".format(num_inserted))

Tip

Make sure you enter the password for distal_user in the call to psycopg2.connect(). Also make sure that you use the correct name for the NationalFile_YYYYMMDD.txt file you downloaded, according to its datestamp.

Note that we're committing the changes to the database for every 1,000 records. This speeds up the importing process. We're also using the ST_SetSRID() function to specify the SRID for the Point geometry before we insert it—we need to do this because we've told PostGIS to only accept geometries which have an SRID value of 4326.

When you run this program, it will take a few minutes to complete. Once it's finished, you will have imported over 200,000 US place names into the places table.

Non-US place names

If you haven't already done so, download the list of non-US place names from http://geonames.nga.mil/gns/html/namefiles.html by clicking on the Entire country files dataset hyperlink. Note that this is a large download—over 450 megabytes in compressed form—so it may take a while to download.

Once it has finished downloading, you will end up with a file named geonames_YYYYMMDD.zip where once again YYYMMDD is the datestamp identifying when the file was last updated. Decompress this file, which will result in a directory containing two text files: Countries.txt and Countries_disclaimer.txt. We want the file named Countries.txt, so copy this file into your data subdirectory.

The Countries.txt file is probably too big to open up in a text editor. If you could, however, you'd see that this is a tab-delimited text file that uses UTF-8 character encoding. The first few lines of this file look something like this:

RC  UFI     ... FULL_NAME_ND_RG  NOTE       MODIFY_DATE
1  -1307834 ... Pavia                       1993-12-21
1  -1307889 ... Santa Anna       gjgscript  1993-12-21

As with the US place name data, there are many more features here than we need for the DISTAL application. Since we are only interested in the official names of towns and cities, we need to filter this data in the following way:

  • The FC (feature classification) field tells us which type of feature we are dealing with. We want features with an FC value of "P" (populated place).
  • The NT (name type) field tells us the status of this feature's name. We want names with an NT value of "N" (approved name).
  • The DSG (feature designation code) field tells us the type of feature, in more detail than the FC field. For the DISTAL application, we are interested in features with a DSG value of "PPL" (populated place), "PPLA" (administrative capital), or "PPLC" (capital city).

    Note

    If you want to see a full list of all the feature designation codes, go to http://geonames.nga.mil/namesgaz and click on the Designation Codes link in the sidebar on the left, under Lookup Tables. This brings up a page where you can search for a particular feature designation code; if you leave all the fields blank and click on the Search Designation Codes button, a window will appear showing you all the various feature designation codes and what they mean.

There are also several different versions of each place name; we want the full name in normal reading order, which is in the field named FULL_NAME_RO.

This tells us everything we need to know to import the non-US place names into our places database table. Create a new Python script named import_geonames.py inside your DISTAL directory, and enter the following into this file:

import psycopg2
import os.path

connection = psycopg2.connect(database="distal",
                              user="distal_user",
                              password="...")
cursor = connection.cursor()

f = open(os.path.join("data", "Countries.txt"), "r")

heading = f.readline() # Ignore field names.

num_inserted = 0

for line in f:
    parts = line.rstrip().split("\t")
    lat = float(parts[3])
    long = float(parts[4])
    featureClass = parts[9]
    featureDesignation = parts[10]
    nameType = parts[17]
    featureName = parts[22]

    if (featureClass == "P" and nameType == "N" and
        featureDesignation in ["PPL", "PPLA", "PPLC"]):
        cursor.execute("INSERT INTO places " +
                       "(name, position) VALUES (%s, " +
                       "ST_SetSRID(" +
                       "ST_MakePoint(%s,%s), 4326))",
                       (featureName, long, lat))

        num_inserted = num_inserted + 1
        if num_inserted % 1000 == 0:
            connection.commit()

f.close()
connection.commit()

print("Inserted {} placenames".format(num_inserted))

Notice that we don't delete the existing contents of the places table. This is because we've already imported some data into that table. If you need to re-import the data, you'll need to run both the import_gnis.py script and the import_geonames.py script, in that order.

Go ahead and run this program. It will take a while to complete—maybe 10 to 15 minutes, depending on the speed of your computer—as you're adding over three million place names to the database.