Carry out the following steps:
- First, check out how the OSM .pbf file is built using ogrinfo. PBF is a binary format intended as an alternative to the OSM XML format, mainly because it is much smaller. As you must have noticed, it is composed of several layers—you will export the lines layer to PostGIS as that layer contains the street names that you will use for the overall geocoding process:
$ ogrinfo lazio.pbf
Had to open data source read-only.
INFO: Open of `lazio.pbf'
using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon) 5: other_relations (Geometry Collection)
- Export the lines' OSM features to a PostGIS table, using ogr2ogr (ogr2ogr, as always, will implicitly create the GiST index that is needed by the pg_trgm module to run):
$ ogr2ogr -f PostgreSQL -lco GEOMETRY_NAME=the_geom
-nln chp08.osm_roads
PG:"dbname='postgis_cookbook' user='me'
password='mypassword'" lazio.pbf lines
- Now try a trigram matching to identify the road names similar to a given search text, using a query such as the following. Note that the similarity function returns a value that decreases from 1 to 0 as the similarity of the word decreases (with 1, the strings are identical; with 0, they are totally different):
postgis_cookbook=# SELECT name,
similarity(name, 'via benedetto croce') AS sml,
ST_AsText(ST_Centroid(the_geom)) AS the_geom
FROM chp08.osm_roads
WHERE name % 'via benedetto croce'
ORDER BY sml DESC, name;

- As a variant, you will use the following query to complete the recipe (in this case, when the weight is 0, the strings are identical):
postgis_cookbook=# SELECT name,
name <-> 'via benedetto croce' AS weight
FROM chp08.osm_roads
ORDER BY weight LIMIT 10;

- We will use the last query as the SQL core of a Python class, which will provide geocoding features to the consumer, using the layer we just imported in PostGIS (chp08.osm_roads). First, create a file named osmgeocoder.py and add the following class to it:
import sys
import psycopg2
class OSMGeocoder(object):
"""
A class to provide geocoding features using an OSM
dataset in PostGIS.
"""
def __init__(self, db_connectionstring):
# initialize db connection parameters
self.db_connectionstring = db_connectionstring
def geocode(self, placename):
"""
Geocode a given place name.
"""
# here we create the connection object
conn = psycopg2.connect(self.db_connectionstring)
cur = conn.cursor()
# this is the core sql query, using trigrams to detect
# streets similar to a given placename
sql = """
SELECT name, name <-> '%s' AS weight,
ST_AsText(ST_Centroid(the_geom)) as point
FROM chp08.osm_roads
ORDER BY weight LIMIT 10;
""" % placename
# here we execute the sql and return all of the results
cur.execute(sql)
rows = cur.fetchall()
cur.close()
conn.close()
return rows
- Now, add the __main__ check to provide the class user a method to directly use the geocoder from the command line:
if __name__ == '__main__':
# the user must provide at least two parameters, the place name
# and the connection string to PostGIS
if len(sys.argv) < 3 or len(sys.argv) > 3:
print "usage: <placename> <connection string>"
raise SystemExit
placename = sys.argv[1]
db_connectionstring = sys.argv[2]
# here we instantiate the geocoder, providing the needed
# PostGIS connection parameters
geocoder = OSMGeocoder(db_connectionstring)
# here we query the geocode method, for getting the
# geocoded points for the given placename
results = geocoder.geocode(placename)
print results
- Now you can test the class by calling the script, as shown here:
(postgis-cb-env)$ python osmgeocoder.py "Via Benedetto Croce"
"dbname=postgis_cookbook user=me password=mypassword"
[('Via Benedetto Croce', 0.0, 'POINT(12.6999095325807
42.058016054317)'),...
- So, now that you wrote a class that can be used to geocode street names, let's suppose that another user wants to use it to geocode a file with a list of street names in order to import it in a new PostGIS layer. Here is how the user could do this (try this as well). First, create a streets.txt file with a list of street names; for example:
Via Delle Sette Chiese
Via Benedetto Croce
Lungotevere Degli Inventori
Viale Marco Polo Via Cavour
- Now, create a file named geocode_streets.py and add this Python code to it (you are going to use the OSMGeocoder class to geocode the street name list, and GDAL/OGR to create a new PostGIS layer for storing the geocoded points for the street names):
from osmgeocoder import OSMGeocoder
from osgeo import ogr, osr
# here we read the file
f = open('streets.txt')
streets = f.read().splitlines()
f.close()
# here we create the PostGIS layer using gdal/ogr
driver = ogr.GetDriverByName('PostgreSQL')
srs = osr.SpatialReference()
srs.ImportFromEPSG(4326)
pg_ds = ogr.Open(
"PG:dbname='postgis_cookbook' host='localhost' port='5432'
user='me' password='mypassword'", update = 1 )
pg_layer = pg_ds.CreateLayer('geocoded_points', srs = srs,
geom_type=ogr.wkbPoint, options = [
'GEOMETRY_NAME=the_geom',
'OVERWRITE=YES',
# this will drop and recreate the table every time
'SCHEMA=chp08',
])
# here we add the field to the PostGIS layer
fd_name = ogr.FieldDefn('name', ogr.OFTString)
pg_layer.CreateField(fd_name)
print 'Table created.'
# now we geocode all of the streets in the file
# using the osmgeocoder class
geocoder = OSMGeocoder('dbname=postgis_cookbook user=me
password=mypassword')
for street in streets:
print street
geocoded_street = geocoder.geocode(street)[0]
print geocoded_street
# format is
# ('Via delle Sette Chiese', 0.0,
# 'POINT(12.5002166330412 41.859774874774)')
point_wkt = geocoded_street[2]
point = ogr.CreateGeometryFromWkt(point_wkt)
# we create a LayerDefn for the feature using the
# one from the layer
featureDefn = pg_layer.GetLayerDefn()
feature = ogr.Feature(featureDefn)
# now we store the feature geometry and
# the value for the name field
feature.SetGeometry(point)
feature.SetField('name', geocoded_street[0])
# finally we create the feature
# (an INSERT command is issued only here) pg_layer.CreateFeature(feature)
- Run the preceding script, and then check with your favorite PostgreSQL client or with a GIS desktop tool if the points for the street names were correctly geocoded:
(postgis-cb-env)capooti@ubuntu:~/postgis_cookbook/working/chp08$
python geocode_streets.py
Table created.
Via Delle Sette Chiese
('Via delle Sette Chiese', 0.0,
'POINT(12.5002166330412 41.859774874774)')
...
Via Cavour ('Via Cavour', 0.0, 'POINT(12.7519263341222 41.9631244835521)')