Many applications do more than display a geolocation on a map once it has been acquired. In many cases, the location is saved for later use—possibly displaying a history of where a user has been, or showing where many users are at any given time. In these cases, the browser application will need to collect the geolocation of the device and then send that information to a server for further processing. Most of this backend processing is beyond the scope of this book, but most likely a web server will be used with a server-side language like PHP, Python, C# or VB.NET, Java, etc. The language used does not really matter, but how the information is saved does matter.
For more information on server-side scripting languages, check out some of these titles to get you started: PHP and MySQL Development, 4th Edition by Luke Welling and Laura Thomson (Addison-Wesley Professional), Programming Python, Fourth Edition by Mark Lutz (O’Reilly Media), Head First Java, Second Edition by Kathy Sierra and Bert Bates (O’Reilly Media), and Beginning ASP.NET 4: in C# and VB by Imar Spaanjaars (Wrox).
Since I want to concentrate more on what to do with the geographic information once it has been collected by the browser than how to manipulate it on the server, I will talk specifications more than implementation in the sections to come. There are many ways that geolocation information can be saved for later use: text files, CSV files, XML files, JSON files, KML files, Shapefiles, geodatabases, relational databases, etc. How you decide to save your geometry is going to depend on several factors, including GIS environment, operating systems, and budget.
For example, if you have a limited budget, then going a more open-source route with your GIS needs might be in order. In this case, using KML and Google Maps might be the right direction. If you are in an enterprise environment, however, then you are more likely to be using ArcGIS Desktop and other Esri products. In this type of environment, a more robust Oracle database might be in order. Knowing that there are many solutions to a problem is important knowledge that I hope you take advantage of in your own projects.
I will focus on only three of the ways data can be saved—KML, Shapefiles, and relational databases—because they are all popular ways of saving geographic information. If none of these proves to be a good method for your needs, hopefully it will at least aid you in learning how you can store your data using a different format.
Keyhole Markup Language (KML) is an XML file format designed to hold geographic information that is to be visualized on Internet-based maps and browsers, such as Google Maps and Google Earth. It was originally created by Keyhole, Inc., which was purchased by Google in 2004. Google submitted the KML 2.2 specification to the Open Geospatial Consortium (OGC) to ensure that KML remained an open standard. It became an official OGC standard on April 14, 2008.
Often times you will see a KMZ file extension—this is a zipped file that contains a compressed version of one or more KML files and their associated icon and image files.
KML has many uses for geospatial information, one of which is
holding point data—which I am sure you have figured out by now is the
focus of geolocation. In KML, a point is held within a <Placemark> container. This container
holds a name, description, and
Point geometry, at a minimum. A Simple
Point Placemark is shown here:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>Simple placemark</name>
<description>This is an example of a simple placemark.</description>
<Point>
<coordinates>-90.185278,38.624722</coordinates>
</Point>
</Placemark>
</kml>There are basically three types of Point Placemark that can be created:
Simple
Floating
Extruded
A Simple Point Placemark will always be
attached to the ground, meaning it will always be displayed at the height
of the underlying terrain. A Floating Point Placemark
has a specific height at which it is defined to be above the ground
height. An Extruded Point Placemark is similar to the
Floating Point Placemark in that it is at a specific
height above the ground, but it is tethered to the ground by a
customizable tail. All three of these types are controlled by the data
inside the <Point> element of the
<Placemark>.
The following illustrates the syntax of a Point Placemark, showing the child elements that would be associated with geolocation. For a full list of the elements that can be added as children of a Placemark, see the KML Reference, Placemark at http://code.google.com/apis/kml/documentation/kmlreference.html#placemark:
<Placemark id="ID">
<name>...</name> <!-- string -->
<description>...</description> <!-- string -->
<Timestamp>
<when>...</when> <!-- kml:dateTime -->
</Timestamp>
<ExtendedData>...</ExtendedData> <!-- custom -->
<Point id="ID">
<extrude>...</extrude> <!-- boolean -->
<altitudeMode>...</altitudeMode>
<!-- clampToGround, relativeToGround, or absolute -->
<coordinates>...</coordinates> <!-- long,lat[,alt] -->
</Point>
</Placemark>Looking at the <Point>
element, you will see that it has three children, <extrude>, <altitudeMode>, and <coordinates>. The <coordinates> element is required by any
of the three types of Point Placemark and contains a
latitude and longitude measured in decimal degrees referenced with WGS 84,
and an optional altitude measured in meters above sea level. When an
<altitudeMode> element is added
to the <Point> element, the
Point Placemark becomes Floating
or Extruded. Determining which of these types it is
falls to whether or not the <extrude> element is set to
true with a value of 1.
Example 5-1 shows a KML file with several points in it, along with all of the information that can be gathered by the W3C Geolocation API.
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Placemark id="pt_000000">
<name>Point 000000</name>
<description>This is the first point collected.</description>
<Timestamp><when>2011-04-06T23:24:12+06:00</when></Timestamp>
<ExtendedData>
<Data name="accuracy"><value>20</value></Data>
<Data name="altitudeAccuracy"><value>100</value></Data>
<Data name="heading"><value>NaN</value></Data>
<Data name="speed"><value>0</value></Data>
</ExtendedData>
<Point>
<extrude>0</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<coordinates>-90.185278,38.624722,212</coordinates>
</Point>
</Placemark>
<Placemark id="pt_000001">
<name>Point 000001</name>
<description>This is the second point collected.</description>
<Timestamp><when>2011-04-07T00:15:37+06:00</when></Timestamp>
<ExtendedData>
<Data name="accuracy"><value>10</value></Data>
<Data name="altitudeAccuracy"><value>10</value></Data>
<Data name="heading"><value>37</value></Data>
<Data name="speed"><value>15.6464</value></Data>
</ExtendedData>
<Point>
<extrude>0</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<coordinates>-89.788221,38.4233,18</coordinates>
</Point>
</Placemark>
<Placemark id="pt_000002">
<name>Point 000002</name>
<description>This is the third point collected.</description>
<Timestamp><when>2011-04-07T11:49:03+06:00</when></Timestamp>
<ExtendedData>
<Data name="accuracy"><value>60</value></Data>
<Data name="altitudeAccuracy"><value>80</value></Data>
<Data name="heading"><value>147</value></Data>
<Data name="speed"><value>31.2928</value></Data>
</ExtendedData>
<Point>
<extrude>0</extrude>
<altitudeMode>relativeToGround</altitudeMode>
<coordinates>-90.123129,37.992331,25</coordinates>
</Point>
</Placemark>
</Document>
</kml>Although the latitude, longitude, altitude, and timestamp can be
included natively, the rest of the
geolocation information—accuracy, altitudeAccuracy, heading, and speed—needs to be added in the <ExtendedData> element and defined there
for use. There are three ways this data can be added; see the KML
Reference, ExtendedData at http://code.google.com/apis/kml/documentation/kmlreference.html#extendeddata
for more information on these methods. I chose the data pair method so
that the values would be shown in Google Earth, but one of the other
methods might better suit your application needs.
Because KML is basically text in a file, it is a fairly straightforward bit of programming on the server-side of an application to create this file, read from it, or write to it regardless of the technology being used. Also, because of its XML nature, converting the data in the KML file to a different format is also not that difficult. Working with KML is easy and makes it a good choice for storing geolocation data.
A shapefile is a data format designed for holding geographical vector data like points and polygons along with associated attribute data. It was developed by Esri and is maintained by it. It was specifically designed as a spatial data format for use with Esri’s ArcGIS Desktop product, though it works with other software as well. Some other software that can utilize the shapefile format include AutoCAD Map, MapInfo, GeoMedia, and GRASS.
There are tools available to convert shapefiles to other formats and vice versa, making this a flexible format for holding geolocation information. By holding the point data in a shapefile, it can easily be converted to another format when needed. Some conversion programs are SHP2KML, shp2CAD, and SHP2MIF. The reverse of these programs can also easily be located with a quick web search.
Though it is called a shapefile, the format is actually a set of files that work together to produce the necessary working data. There are three or more files needed for a shapefile, as shown in Table 5-1.
| Extension | Description | Required |
| .shp | Stores the feature geometry. | yes |
| .shx | Stores the index of the feature geometry. | yes |
| .dbf | Stores the attribute information of the feature in a dBASE table. | yes |
| .sbn/.sbx | Stores the spatial index of the features. | no |
| .fbn/.fbx | Stores the spatial index of the features that are read-only. | no |
| .ain/.aih | Stores the attribute index of the active fields in a table or a theme’s attribute table. | no |
| .atx | Stores the attribute index for the dBASE table. | no |
| .ixs | Stores the geocoding index for read/write shapefiles. | no |
| .mxs | Stores the geocoding index for read/write shapefiles (ODB format). | no |
| .prj | Stores the coordinate system information. | no |
| .xml | Stores metadata for the feature. | no |
| .cpg | Stores the codepage for identifying the character set to be used by the shapefile. | no |
[a] ArcGIS Resource Center, Desktop 10, Shapefile file extensions. http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Shapefile_file_extensions/005600000003000000/. | ||
To be useful to a web application, there needs to be a way to programmatically manipulate a shapefile and perform all necessary operations on it (create, read/write, etc.) For instance, the Shapefile C Library serves as a way to write C programs that give reading, writing, and some updating capabilities to a developer. A more useful scripting library for web applications is the Python Shapefile Library.
The Python Shapefile Library (PSL) was written by Joel Lawhead. It provides read and write capabilities for shapefiles using the Python scripting language. It is designed to be as extensible as it can be when creating a shapefile while still having some validation to ensure a proper file is produced. Take a look at Example 5-2.
# Include the Python Shapefile Library
import shapefile as sf
# Name of the shapefile to create
filename = 'shapefiles/geolocation'
# Create a /point/ shapefile, and turn on autoBalance
sf_w = sf.Writer(sf.POINT)
sf_w.autoBalance = 1
# Add the points
sf_w.point(-90.185278, 38.624722, 212)
sf_w.point(-89.788221, 38.4233, 18)
sf_w.point(-90.123129, 37.992331, 25)
# Create attribute information
sf_w.field('Name', 'C', 20)
sf_w.field('Description', 'C', 80)
sf_w.field('Timestamp', 'D')
sf_w.field('Accuracy', 'N', 4, 0)
sf_w.field('AltitudeAccuracy', 'N', 4, 0)
sf_w.field('Heading', 'N', 9, 6)
sf_w.field('Speed', 'N', 7, 4)
# Add attribute information
sf_w.record('Point 000000', 'This is the first point collected.', \
'2011-04-06T23:24:12+06:00', 20, 100, None, 0)
sf_w.record('Point 000001', 'This is the second point collected.', \
'2011-04-07T00:15:37+06:00', 10, 10, 37, 15.6464)
sf_w.record('Point 000002', 'This is the third point collected.', \
'2011-04-07T11:49:03+06:00', 60, 80, 147, 31.2928)
# Save the file
sf_w.save(filename)
# Create a projection file
prj = open("%s.prj" % filename, 'w')
epsg = 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137, \
298.257223563]],PRIMEM["Greenwich",0],UNIT["degree", \
0.0174532925199433]]'
prj.write(epsg)
prj.close()The first line of code imports PSL into the working script. After
specifying that the shapefile will be a POINT type
using the Writer object, the property autoBalance is set to
true. This verifies that when a point or a record
is added with the script, the opposite is also added (every point has a
record and every record has a point). Next, the points are added with
the point() method. The point() method takes a latitude, longitude,
and optional altitude and measure. In Example 5-2, the latitude, longitude, and
altitude of each point is recorded.
Before attribute records can be added to the shapefile, the
attributes must be defined using the field() method. The field() method takes a field name, field type,
field length, and (for numbers) decimal length. Once defined, the
records, one for each point, are created using the record() method. After the records have been
added, the shapefile is saved and the three required files (.shp, .shx, and .dbf) are created. Additionally, Example 5-2 creates a .prj file for a more complete shapefile
instance.
In most cases, the shapefile holding the geolocation information will already be created when the application needs to add another record. The following code shows a small script that can edit an existing shapefile and add another point to it:
import shapefile as sf
filename = 'shapefiles/geolocation'
sf_e = sf.Editor(shapefile = filename + '.shp')
sf_e.point(-102.125532, 34.223411, 40)
sf_e.record('Point 000004', 'This is an appended point. ', \
'2011-04-10T01:52:22+06:00', 20, 30, 118, 17.21)
sf_e.save(filename)The code for editing an existing shapefile and adding a point is
simple. Updating existing points is a little more complicated, however.
The Editor object takes care of inserting and
deleting records in the shapefile. The specific record number needs to
be obtained by first reading the shapefile and locating the record (also
something PSL can do). Then the record should be deleted using the
delete() method, and a new record
with the corrected information should be added to the shapefile.
The Python Shapefile Library is fairly easy to use, even if you do not know a lot of Python to start with. The only downside to this library is that it does not have the best documentation available. Otherwise, it is a great way to manipulate shapefiles within a web application.
A database is an organized collection of data that is built so that the data can be stored, manipulated, and retrieved in an easy manner. The typical databases used for geographic information are relational database management systems (RDBMS), though it is also possible to store the data in object database management systems (ODBMS). Note that for the rest of this chapter, when I refer to a database, I am referring to an RDBMS. Some examples of common RDBMS systems are dBASE, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, and Sybase.
Spatial databases are built so that the spatial data and attributes coexist in the same database. MySQL, DB2, Oracle, and Microsoft SQL Server (starting with 2008) all can store spatial information natively in their tables. In some cases, however, additional software is placed on top of the RDBMS in order to facilitate geographic functionality (especially querying) within the database. ArcSDE, OracleSpatial, and PostGIS are examples of software that is used on top of the databases themselves to handle geographic data. OracleSpatial is built specifically for Oracle and PostGIS is built specifically for PostgreSQL, while ArcSDE works with four commercial databases. MySQL has geographic functionality built directly into it and does not require additional software.
ArcSDE, or simply SDE (Spatial Database Engine), is an Esri product for storing and managing geographic data with other business data within a relational database. It is designed to run with the commercial databases IBM DB2, Informix, Microsoft SQL Server, and Oracle, as well as the open source database PostgreSQL. Starting with ArcGIS 9.2, Esri stopped selling ArcSDE as a stand-alone product and began bundling it with their ArcGIS Desktop and ArcGIS Server products. The latest release of the software at the time of this writing is 10.0. ArcSDE supports various standards, including OGC simple features, the International Organization for Standardization (ISO) spatial types, the OracleSpatial format, the PostGIS format, and the Microsoft spatial format.
PostGIS adds spatial functionality to the PostgreSQL relational database. It was developed by Refractions Research as an open source project and is released under the GNU General Public License. The first stable version (1.0) of the software was released in 2005. The current version of the software (as of this writing) is 1.5.2. PostGIS, which acts like ArcSDE or OracleSpatial, follows the OGC Simple Feature specification, though it has not been certified compliant by the OGC.
The following should give you some idea of PostGIS functionality using SQL:
SELECT loc.the_geom FROM geolocations loc INNER JOIN (SELECT the_geom FROM (SELECT the_geom, ST_Area(the_geom) AS area FROM parks) p WHERE area > 10000) park ON ST_Intersects(loc.the_geom, park.the_geom)
This query finds all geolocations that are located within city
parks with an area greater than 10,000 feet. To do this, it first pulls
the parks polygons and calculates their areas using the PostGIS function
ST_Area(). It then finds the parks
with an area larger than 10,000 feet. Finally, it finds the geolocations
located within the parks using the ST_Intersects() function. The results of this
query is the geometry associated with each geolocation found within a
city park with an area greater than 10,000 feet.
MySQL is the world’s most popular open source database, in use by some of the most heavily visited websites like Google, Wikipedia, YouTube, and Facebook. Instead of requiring additional software on top of itself, MySQL implements a subset of the OGC SQL with Geometry Types specification directly into its database. It has taken MySQL several releases since it first introduced spatial capabilities to get to the place where other spatial databases like PostGIS and OracleSpatial currently are.
The OGC naming conventions were not implemented in MySQL until version 5.6. Unfortunately, as of the time of this writing, the current generally available community release of MySQL is 5.5.11, which has differences in naming. For example, MySQL 5.6 would utilize the exact same SQL statement as the example in PostGIS. The MySQL 5.5 version of this code would look like the following:
SELECT loc.the_geom FROM geolocations loc INNER JOIN (SELECT the_geom FROM (SELECT the_geom, Area(the_geom) AS area FROM parks) p WHERE area > 10000) park ON Intersects(loc.the_geom, park.the_geom)
As you can see, they are very similar in nature to one another, and anyone with some SQL and spatial database experience could figure out MySQL’s version of things. Once MySQL 5.6 becomes generally available, MySQL will have caught up with its competitors, making it very attractive solution for spatial data management considering its popularity as a relational database.
To conclude the discussion on spatial data management with relational databases, Example 5-3 creates the structure our geolocations would need to match the examples in KML or Python Shapefile Library.
CREATE DATABASE geolocations; USE geolocations; CREATE TABLE positions ( pos_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, the_geom POINT NOT NULL, altitude DECIMAL(8, 2) NOT NULL, accuracy DECIMAL(4, 0) NOT NULL, altitudeAccuracy DECIMAL(4, 0) NULL, heading DECIMAL(9, 6) NULL, speed DECIMAL(7, 4) NULL, timestamp DATETIME NOT NULL, name VARCHAR(20) NOT NULL, description VARCHAR(80) NULL );
This example creates a new database called geolocations, and then creates a table called
positions that holds all of the
attribute data that can be collected from the W3C Geolocation API. The
SQL script to insert a new position record into our database would look
like this:
INSERT INTO positions (
the_geom,
altitude,
accuracy,
altitudeAccuracy,
heading,
speed,
timestamp,
name,
description
) VALUES (
GeomFromText('POINT(-89.788221 38.4233)'),
18,
10,
10,
37,
15.6464,
'2011-04-07 00:15:37',
'Point 000001',
'This is the second point collected.'
);This SQL statement adds a point with the OGC Well-Known
Text (WKT) format using the GeomFromText() function. This SQL would be
executed from a server-side script with values passed to it from the
client after a location had been retrieved. The table creation and
insertion would be almost the same in any relational database.