Examining data content is an important part of any project. Understanding information specific to your dataset will help you use it more effectively. Each piece of spatial data will have some geographic component to it (coordinates describing the location of real features), but it will also have what are called attributes.These are non-geographic data about the geographic feature, such as the size of population, the name of a building, the color of a lake, etc. You will often hear the geographic coordinate data described as spatial data and the attribute information referred to as tabular, attribute, or nonspatial data. It is equally valid to call any dataset spatial if it has some geographic component to it.
The MapServer demo data includes a variety of vector
spatial files; therefore you will use the ogrinfo tool to gather information about the
files. At the command prompt, change into the workshop folder, and run the ogrinfo command to have it list the datasets
that are in the data folder. The
output from the command will look like Example 6-1.
> ogrinfo data
INFO: Open of 'data'
using driver 'ESRI Shapefile' successful.
1: twprgpy3 (Polygon)
2: rmprdln3 (Line String)
3: lakespy2 (Polygon)
4: stprkpy3 (Polygon)
5: ctyrdln3 (Line String)
6: dlgstln2 (Line String)
7: mcd90py2 (Polygon)
8: twprdln3 (Line String)
9: plsscpy3 (Polygon)
10: mcdrdln3 (Line String)
11: majrdln3 (Line String)
12: drgidx (Polygon)
13: airports (Point)
14: ctybdpy2 (Polygon)This shows that there are 14 layers in the data folder (the order of the listing may vary on other systems). You can also see that the folder contains ESRI shapefile format files. Each shapefile is a layer in this listing. If you look at the files located in the data folder, you will see that there are way more than 14 files. This is because a shapefile consists of more than one file: one holds spatial data, another holds tabular data, etc.
A summary of more information for each layer can be seen by
adding the name of the layer to the ogrinfo command and a -summary parameter, as shown in Example 6-2.
> ogrinfo -summary data airports
INFO: Open of 'data'
using driver 'ESRI Shapefile' successful.
Layer name: airports
Geometry: Point
Feature Count: 12
Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)
Layer SRS WKT:
(unknown)
NAME: String (64.0)
LAT: Real (12.4)
LON: Real (12.4)
ELEVATION: Real (12.4)
QUADNAME: String (32.0)This example shows information about the airports layer.
Geometry: Point
The geographic features in this file are points. In the next example, you will see that each airport feature has one pair of location coordinates.
Feature Count: 12
Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)There are 12 airport features in this layer, and they fall
within the range of coordinates shown in the Extent line. The coordinates are measured in
meters and are projected into the Universal Transverse Mercator
(UTM) projection.
Layer SRS WKT:
(unknown)This explains what map projection the data is in. SRS stands for spatial reference system and WKT for well-known text format. Without getting into too much detail, these are terms popularized or created by the OGC. The SRS gives information about projections, datums, units of measure in the data, etc. WKT is a method for describing those statistics in a text-based, human-readable format (as opposed to a binary format). Refer to Appendix A for more information about map projections, SRS, and the EPSG numbering system. See Chapter 12 for more information on the OGC and its role in setting standards.
The previous example also says unknown because the creator of the data
didn’t explicitly include projection information within the file. This
isn’t very helpful if you don’t know where the data is from. However,
those familiar with the data might guess that it is in UTM
coordinates.
NAME: String (64.0)
LAT: Real (12.4)
LON: Real (12.4)
ELEVATION: Real (12.4)
QUADNAME: String (32.0)These five lines tell you about the other types of nonspatial
information that accompany each geographic feature. A feature, in this
case, is a coordinate for an airport. These different pieces of
information are often referred to as attributes, properties, columns,
or fields. Each attribute has a name identifier and can hold a certain
type of information. In the previous example, the text before the
colon is the name of the attribute. Don’t be confused by the fact that
there is also an attribute called NAME in this file. The first line describes
an attribute called NAME. The word
after the colon tells you what kind of data can be held in that
attribute—either String (text
characters) or Real (numbers). The
numbers in the parentheses tell more specifically how much of each
kind of data can be stored in the attribute. For example NAME: String (64.0) means that the attribute called
NAME can hold up to 64 letters or
numbers. Likewise ELEVATION:
Real (12.4) means that the ELEVATION attribute can hold up to only
12-digit numbers with a maximum of 4 decimal places.
You may be wondering why this is important to review. Some of
the most common errors in using map data can be traced back to a poor
understanding of the data. This is why reviewing data with tools such
as ogrinfo can be very helpful
before launching into mapmaking. If you don’t understand what kind of
attributes you have at your disposal, you may not use the data to its
fullest potential or you may push its use beyond appropriate bounds.
Understanding your data in this depth will prevent future mistakes
during the mapping process or during any analysis you may undertake.
If your analysis relies on a certain kind of numbers with a level of
precision or expected length of text, you need to make sure that the
data you are analyzing actually holds these kinds of values, or you
will get misleading results. Having this knowledge early in the
process will help you have a more enjoyable experience along the
way.
Summary information tells only part of the story. The
same tools can be used to provide detailed information about the
geographic data and its attributes. To get details, instead of summary
information, you can use ogrinfo
with a dataset and layer name like that in Example 6-3, but don’t include
the -summary parameter.
> ogrinfo data airports
INFO: Open of 'data'
using driver 'ESRI Shapefile' successful.
Layer name: airports
Geometry: Point
Feature Count: 12
Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)
Layer SRS WKT:
(unknown)
NAME: String (64.0)
LAT: Real (12.4)
LON: Real (12.4)
ELEVATION: Real (12.4)
QUADNAME: String (32.0)
OGRFeature(airports):0
NAME (String) = Bigfork Municipal Airport
LAT (Real) = 47.7789
LON (Real) = -93.6500
ELEVATION (Real) = 1343.0000
QUADNAME (String) = Effie
POINT (451306 5291930)
OGRFeature(airports):1
NAME (String) = Bolduc Seaplane Base
LAT (Real) = 47.5975
LON (Real) = -93.4106
ELEVATION (Real) = 1325.0000
QUADNAME (String) = Balsam Lake
POINT (469137 5271647)This view of the airport details tells you what value each
airport has for each attribute. As you can see, the summary
information is still included at the top of the listing, but then
there are small sections for each feature. In this case there are
seven lines, or attributes, for each airport. For example, you can see
the name of the airport, but you can also see the UTM coordinate shown
beside the POINT attribute.
This dataset also has a set of LAT and LON fields that are just numeric
attributes and have nothing to do with using this data in a map. Not
all types of point data have these two attributes. They just
happened to be part of the attributes the creator wanted to keep.
The actual UTM coordinates are encoded in the last attribute,
POINT.
Only two features are shown in this example, the first starting
with OGRFeature(airports):0. The
full example goes all the way to OGRFeature(airports):11, including all 12
airports. The rest of the points aren’t shown in this example, just to
keep it simple.
ogrinfo is a great tool for
digging even deeper into your data. There are more options that can be
used, including a database query-like ability to select features and
the ability to list only features that fall within a certain area.
Running man ogrinfo (if your operating system supports
manpages) shows the full usage for each parameter. Otherwise, the
details are available on the OGR web site at http://www.gdal.org/ogr/ogr_utilities.html. You can
also run the ogrinfo command with
the --help parameter (ogrinfo --help) to get a summary of options. Example 6-4 shows some examples
of how they can be used with your airport data.
>ogrinfo data airports-where "name='Bolduc Seaplane Base'"INFO: Open of 'data' using driver 'ESRI Shapefile' successful. Layer name: airports Geometry: Point Feature Count: 1 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000) Layer SRS WKT: (unknown) NAME: String (64.0) LAT: Real (12.4) LON: Real (12.4) ELEVATION: Real (12.4) QUADNAME: String (32.0) OGRFeature(airports):1 NAME (String) = Bolduc Seaplane Base LAT (Real) = 47.5975 LON (Real) = -93.4106 ELEVATION (Real) = 1325.0000 QUADNAME (String) = Balsam Lake POINT (469137 5271647)
This example lists only those airports that have the name
Bolduc Seaplane Base. As you can
imagine, there is only one. Therefore, the summary information about
this layer and one set of attribute values are listed for the single
airport that meets this criteria in Example 6-5. The -sql option can also specify what attributes
to list in the ogrinfo
output.
If you are familiar with SQL, you will understand that the
-sql option accepts an SQL
statement. If SQL is something new to you, please refer to other
database query language documentation, such as:
SQL in a Nutshell (O’Reilly)
SQL tutorial at http://www.w3schools.com/sql/
Many database manuals include a comprehensive reference
section on SQL. The implementation of SQL in ogrinfo isn’t complete and supports only
SELECT statements.
> ogrinfo data airports -sql "select name from airports where quadname='Side Lake'"
INFO: Open of 'data'
using driver 'ESRI Shapefile' successful.
layer names ignored in combination with -sql.
Layer name: airports
Geometry: Point
Feature Count: 2
Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)
Layer SRS WKT:
(unknown)
name: String (64.0)
OGRFeature(airports):4
name (String) = Christenson Point Seaplane Base
POINT (495913 5279532)
OGRFeature(airports):10
name (String) = Sixberrys Landing Seaplane Base
POINT (496393 5280458)The SQL parameter is set to show only one attribute, NAME, rather than all seven attributes for
each feature. It still shows the coordinates by default, but none of
the other information is displayed. This is combined with a query to
show only those features that meet a certain QUADNAME requirement.
Example 6-6 shows
how ogrinfo can use some spatial
logic to find features that are within a certain area.
> ogrinfo data airports -spat 451869 5225734 465726 5242150
Layer name: airports
Geometry: Point
Feature Count: 2
Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)
Layer SRS WKT:
(unknown)
NAME: String (64.0)
LAT: Real (12.4)
LON: Real (12.4)
ELEVATION: Real (12.4)
QUADNAME: String (32.0)
OGRFeature(airports):7
NAME (String) = Grand Rapids-Itasca County/Gordon Newstrom Field
LAT (Real) = 47.2108
LON (Real) = -93.5097
ELEVATION (Real) = 1355.0000
QUADNAME (String) = Grand Rapids
POINT (461401 5228719)
OGRFeature(airports):8
NAME (String) = Richter Ranch Airport
LAT (Real) = 47.3161
LON (Real) = -93.5914
ELEVATION (Real) = 1340.0000
QUADNAME (String) = Cohasset East
POINT (455305 5240463)The ability to show only features based on where they are
located is quite powerful. You do so using the -spat parameter followed by two pairs of
coordinates. The first pair of coordinates 451869 5225734 represent the southwest
corner of the area you are interested in querying. The second pair of
coordinates 465726 5242150
represents the northeast corner of the area you are interested in,
creating a rectangular area.
This is typically referred to as a bounding
box, where one pair of coordinates represents the
lower-left corner of the box and the other pair represents the upper
right. A bounding box gives a program, such as ogrinfo, a quick way to find features you
need.
ogrinfo then shows only those
features that are located within the area you define. In this case,
because the data is projected into the UTM coordinate system, the
coordinates must be specified in UTM format in the -spat parameter. Because the data is stored
in UTM coordinates, you can’t specify the coordinates using decimal
degrees (°) for instance. The coordinates must always be specified
using the same units and projection as the source data, or you will
get inaccurate results.
Example 6-7 is
similar to a previous example showing complex query syntax using the
-sql parameter, but it differs in
one respect.
>ogrinfo data airports-sql "select * from airports where elevation > 1350 and quadname like '%Lake'" -summaryINFO: Open of 'data' using driver 'ESRI Shapefile' successful. layer names ignored in combination with -sql. Layer name: airports Geometry: Point Feature Count: 5 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)
If you add the -summary
option, it doesn’t list all the attributes of the features, but shows
only a summary of the information. In this case, it summarizes only
information that met the criteria of the -sql parameter. This is very handy if you
just want to know how many features meet certain criteria or fall
within a certain area but don’t care to see all the details.
You can download a sample satellite image from http://geogratis.cgdi.gc.ca/download/RADARSATRADARSAT/mosaic/canada_mosaic_lcc_1000m.zip. If you unzip the file, you create a file called canada_mosaic_lcc_1000m.tif. This is a file containing an image from the RADARSAT satellite. For more information about RADARSAT, see http://www.ccrs.nrcan.gc.ca/ccrs/data/satsens/radarsat/rsatndx_e.html.
To better understand what kind of data this is, use the gdalinfo command. Like the ogrinfo command, this tool lists certain
pieces of information about a file, but the GDAL tools can interact
with raster/image data. The output from gdalinfo is also very similar to ogrinfo as you can see in Example 6-8. You should change
to the same folder as the image before running the gdalinfo command.
> gdalinfo canada_mosaic_lcc_1000m.tif
Driver: GTiff/GeoTIFF
Size is 5700, 4800
Coordinate System is:
PROJCS["LCC E008",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.2572221010042,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4269"]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["standard_parallel_1",49],
PARAMETER["standard_parallel_2",77],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",-95],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]
Origin = (-2600000.000000,10500000.000000)
Pixel Size = (1000.00000000,-1000.00000000)
Corner Coordinates:
Upper Left (-2600000.000,10500000.000) (177d17'32.31"W, 66d54'22.82"N)
Lower Left (-2600000.000, 5700000.000) (122d54'49.00"W, 36d12'53.87"N)
Upper Right ( 3100000.000,10500000.000) ( 9d58'39.57"W, 62d25'50.45"N)
Lower Right ( 3100000.000, 5700000.000) ( 62d32'49.65"W, 34d18'5.61"N)
Center ( 250000.000, 8100000.000) ( 89d56'43.00"W, 62d46'47.18"N)
Band 1 Block=5700x1 Type=Byte, ColorInterp=GrayThere are five main sections in this report. Unlike ogrinfo, there aren’t a lot of different
options, and attributes are very simplistic. The first line tells you
what image format the file is.
Driver: GTiff/GeoTIFF
In this case, it tells you the file is a GeoTIFF image. TIFF
images are used in general computerized photographic applications such
as digital photography and printing. However, GeoTIFF implies that the
image has some geographic information encoded into it. gdalinfo can be run with a —formats option, which lists all the raster
formats it can read and possibly write. The version of GDAL included
with FWTools has support for more than three dozen formats! These
include several proprietary software vendor formats and many related
to specific types of satellite data.
The next line shows the size of the image:
Size is 5700, 4800.
An image size is characterized by the number of data rows and columns. An image is a type of raster data. A raster is made up of numerous rows of adjoining squares called cells or pixels. Rows usually consist of cells that are laid out east to west, whereas columns of cells are north to south. This isn’t always the case but is a general rule of thumb. This image has 5,700 columns and 4,800 rows. The first value in the size statement is usually the width, therefore the number of columns of cells. Row and column numbering usually begins at the upper-left corner of the image and increases toward the lower-right corner. Therefore, cell 0,0 is the upper left, and cell 5700, 4800 is the lower right.
Images can be projected into various coordinate reference systems (see Appendix A for more about map projections):
Coordinate System is:
PROJCS["LCC E008",
GEOGCS["NAD83",
DATUM["North_American_Datum_1983",
SPHEROID["GRS 1980",6378137,298.2572221010042,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6269"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4269"]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["standard_parallel_1",49],
PARAMETER["standard_parallel_2",77],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",-95],
PARAMETER["false_easting",0],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]]]These assign a cell to a global geographic coordinate. Often these coordinates need to be adjusted to improve the appearance of particular applications or to line up with other pieces of data. This image is in a projection called Lambert Conformal Conic (LCC). You will need to know what projection data is in if you want to use it with other data. If the projections between data don’t match, you may need to reproject them into a common projection.
MapServer can reproject files/layers on the fly. This means you don’t have to change your source data unless you want higher performance.
The latitude of origin and central meridian settings are given in geographic coordinates using degree (°) units. They describe where the coordinate 0,0 starts. Latitude 0° represents the equator. In map projections central meridians are represented by a longitude value. Longitude -95°, or 95° West, runs through central Canada.
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",-95],Note that in the earlier projection, the unit setting is
metre. When you look at Pixel Size in a moment, you will see a
number but no unit. It is in this unit (meters) that the pixel sizes
are measured.
Cells are given row and column numbers, but are also given geographic coordinate values. The origin setting tells what the geographic coordinate is of the cell at row 0, column 0. Here, the value of origin is in the same projection and units as the projection for the whole image. The east/west coordinate -2,600,000 is 2,600,000 meters west of the central meridian. The north/south coordinate is 10,500,000 meters north of the equator.
Origin = (-2600000.000000,10500000.000000)
Pixel Size = (1000.00000000,-1000.00000000)Cells are also called pixels and each of them has a defined
size. In this example the pixels have a size of 1000 × 1000: the
-1000 is just a notation; the
negative aspect of it can be ignored for now. In most cases, your
pixels will be square, though it is possible to have rasters with
nonsquare pixels. The unit of these pixel sizes is in meters, as
defined earlier in the projection for the image. That means each pixel
is 1,000 meters wide and 1,000 meters high.
Each pixel has a coordinate value as well. This coordinate locates the upper-left corner of the pixel. Depending on the size of a pixel, it can be difficult to accurately locate it: a pixel is a square, not a discrete point location. Therefore, the upper-left corner of the pixel covers a different place on the ground than the center, but both have the same location coordinate. The accuracy of raster-based data is limited by the size of the pixel.
Much like the previous origin settings, corner coordinates tell you the geographic coordinate the corner pixels and center of the image have:
Corner Coordinates:
Upper Left (-2600000.000,10500000.000) (177d17'32.31"W, 66d54'22.82"N)
Lower Left (-2600000.000, 5700000.000) (122d54'49.00"W, 36d12'53.87"N)
Upper Right ( 3100000.000,10500000.000) ( 9d58'39.57"W, 62d25'50.45"N)
Lower Right ( 3100000.000, 5700000.000) ( 62d32'49.65"W, 34d18'5.61"N)
Center ( 250000.000, 8100000.000) ( 89d56'43.00"W, 62d46'47.18"N)Notice that the coordinates are first given in their projected
values, but also given in their unprojected geographic coordinates,
longitude, and latitude. Knowing this will help you determine where on
the earth your image falls. If you thought this image was in Greece,
you’d be wrong. The geographic coordinates clearly put it in the
western hemisphere: 177d17'32.31"W
is 177 degrees, 17 minutes, 32.31 seconds west of the prime
meridian.
Images are made up of different bands of data. In some cases,
you can have a dozen different bands, where each band stores values
about a specific wavelength of light that a sensor photographed. In
this case, there is only one band Band
1. The ColorInterp=Gray
setting tells you that it is a grayscale image, and Type=Byte tells you that it is an 8-bit (8
bits=1 byte) image. Because 8 bits of data can hold 256 different
values, this image could have 256 different shades of gray.
Band 1 Block=5700x1 Type=Byte, ColorInterp=Gray
If you have more than one band in an image, you can start to have color images that combine values from, for example, red, green, and blue (RGB) bands. Most normal digital photographs you see are set up this way, with each band having 256 values of its specific color. When combined, they can be assigned to specific RGB values on, for example, your computer monitor. That type of image would be considered a 24-bit image (8 bits per band × 3 bands).
If you add the -mm parameter
to the gdalinfo command, as shown
in Example 6-9, you get a
summary of the minimum and maximum color values for the bands in the
image.
>gdalinfo canada_mosaic_lcc_1000m.tif-mm... Band 1 Block=5700x1 Type=Byte, ColorInterp=Gray Computed Min/Max=0.000,255.000
This shows that there are 256 different values used in this image (with 0 being the minimum value).