While ogrinfo and other
ogr utilities are powerful tools,
basic text-processing tools such as sort, uniq,
wc, and sed can give them an extra bit of flexibility.
The tools here are readily available for Unix-type operating systems
(like Linux) by default. They are also available for other operating
systems but you may need to download a package (e.g., from http://gnu.org) to get them for your system.
Each command can receive text streams. In this case, the text
stream will be the lines of information coming from ogrinfo and listed on the screen. These
commands take in those lines and allow you to, for example, show only
certain portions of them, to throw away certain lines, reformat them, do
a search/replace function or count items. Many types of functions can be
done using the ogrinfo -sql parameter, but the ultimate formatting of
the results isn’t always what is desired. These examples show some
common patterns for extracting specific information and generating more
custom stats.
These text-processing tools are sometimes packaged together, but are usually separate projects in and of themselves. Most of them were formed as part of the GNU/Free Software Foundation and are registered with the GNU free software directory at http://www.gnu.org/directory/. The targets of GNU software are free operating systems, which can cause some problems if you are dependent on an operating system such as Microsoft Windows. Some operating system don’t normally include these tools, but they can often be acquired from Internet sources or even purchased.
A very comprehensive set of these tools for Windows is available at http://unxutils.sourceforge.net/. You can download a ZIP file that contains all the programs. If you unzip the file and store the files in a common Windows folder, such as C:\Windows\System32 or C:\winnt\System32, they will be available to run from the command prompt.
If the tool or command you want isn’t included, the next place to look is the GNU directory (http://gnu.org). This is where to start if you are looking for a particular program. A home page for the program and more information about it are available. Look for the download page for the program first to see if there is a binary version of the tool available for your operating system. If not, you may need to download the source code and compile the utility yourself.
Another resource to search is the Freshmeat web site at http://freshmeat.net. This site helps users find programs or projects and also provides daily news reports of what is being updated. Many projects reported in Freshmeat are hosted on the Sourceforge web site at http://sourceforge.net.
One source that is commonly used on Windows is the Cygwin environment, which can be found at http://www.cygwin.com. The web site describes Cygwin as “a Linux-like environment for Windows.” Cygwin can be downloaded and installed on most modern Windows platforms and provides many of the text-processing tools mentioned previously. Furthermore, it also provides access to source-code compilers such as GCC.
Mac OS X includes many of the same kinds of text-processing tools. They may not be exactly the same as the GNU programs mentioned here, but similar alternatives are available in the Darwin core underlying OS X. For ones that aren’t available natively in OS X, they can be compiled from the GNU source code or acquired through your favorite package manager such as Fink.
The standard output of ogrinfo reports are a set of lines
displaying information about each feature. As earlier, this output is
quite verbose, showing some summary information first, then sections
for each feature. In the case of the airport data, each airport has
its own section of seven lines. Example 6-10 shows a couple of
these sections covering 2 of the 12 features (the rest were removed to
reduce unnecessary length).
> ogrinfo data airports
INFO: Open of 'data'
using driver 'ESRI Shapefile' successful.
Layer name: airports
Geometry: Point
Feature Count: 12
Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)
Layer SRS WKT:
(unknown)
NAME: String (64.0)
LAT: Real (12.4)
LON: Real (12.4)
ELEVATION: Real (12.4)
QUADNAME: String (32.0)
OGRFeature(airports):0
NAME (String) = Bigfork Municipal Airport
LAT (Real) = 47.7789
LON (Real) = -93.6500
ELEVATION (Real) = 1343.0000
QUADNAME (String) = Effie
POINT (451306 5291930)
OGRFeature(airports):1
NAME (String) = Bolduc Seaplane Base
LAT (Real) = 47.5975
LON (Real) = -93.4106
ELEVATION (Real) = 1325.0000
QUADNAME (String) = Balsam Lake
POINT (469137 5271647)But what if you don’t really care about a lot of the information
that is displayed? You can use the ogrinfo options -sql and -where, but they still show you summary
information and don’t necessarily format it the way you want. Various
other operating system programs can help you reformat the output of
ogrinfo. Examples of these commands
follow, starting with the grep
command.
The grep commands can
be used to show only certain lines being printed to your screen; for
example, to find a certain line in a text file. In this case, we are
piping the text stream that ogrinfo
prints into the grep command and
analyzing it. The results are that any line starting with two spaces
and the word NAME are printed; the
rest of the lines won’t show. Note that the pipe symbol | is the vertical bar, usually the uppercase
of the key \ on your keyboard. This
tells the command-line interpreter to send all the results of the
ogrinfo command to the grep command for further processing. You
then add an option at the end of the command telling it which lines
you want to see in your results, as shown in Example 6-11.
> ogrinfo data airports | grep ' NAME'
NAME (String) = Bigfork Municipal Airport
NAME (String) = Bolduc Seaplane Base
NAME (String) = Bowstring Municipal Airport
NAME (String) = Burns Lake Seaplane Base
NAME (String) = Christenson Point Seaplane Base
NAME (String) = Deer River Municipal Airport
NAME (String) = Gospel Ranch Airport
NAME (String) = Grand Rapids-Itasca County/Gordon Newstrom Field
NAME (String) = Richter Ranch Airport
NAME (String) = Shaughnessy Seaplane Base
NAME (String) = Sixberrys Landing Seaplane Base
NAME (String) = Snells Seaplane BaseIf you want some other piece of information to show instead,
simply change 'NAME' (including two
preceding spaces) to 'abc', which
is the text or numbers you are interested in. For example, grep 'LAT' shows only the LAT lines. Notice that using 'NAME' without the preceding spaces as in
NAME lists the QUADNAME attributes as well.
Now that you have a list of attribute values in your airports file, you can start to use other
commands. The wc command can perform a variety of analysis functions
against a list of text. The name wc
stands for word count. It can count the number of characters, words,
or lines in a list of text (or a file) and report them back to you.
Output from grep or ogrinfo can be redirected to wc to be further analyzed.
In this case we use wc to
count the number of lines (using the -l line count option). Combined with the
grep command, as shown in the
following example, this shows the number of airports that grep would have printed to your
screen.
> ogrinfo data airports | grep ' NAME' | wc -l
12Another very powerful tool is the text stream-editing
tool called sed. sed allows a user to filter a list of text
(in this case the listing from ogrinfo) and perform text substitutions
(search and replace), find or delete certain text, etc. If you are
already familiar with regular expression syntax, you will find
yourself right at home using sed,
because it uses regex syntax to define its filters.
In this example, you take the full output of the ogrinfo command again and search entries
that contain the words Seaplane
Base. What makes this different
than the grep example is the
inclusion of the trailing dollar $
sign at the end of the phrase. This symbol represents the end of the
line. This example, therefore, prints only airport names that have
Seaplane Base at the end of the
name; it doesn’t print any airport without Seaplane Base in its name and also excludes
airports that have the phrase in anything but the last part of the
name. As in Example
6-12, the airport named Joes
Seaplane Base and
Cafe wouldn’t be returned.
> ogrinfo data airports | sed -n '/Seaplane Base$/p'
NAME (String) = Bolduc Seaplane Base
NAME (String) = Burns Lake Seaplane Base
NAME (String) = Christenson Point Seaplane Base
NAME (String) = Shaughnessy Seaplane Base
NAME (String) = Sixberrys Landing Seaplane Base
NAME (String) = Snells Seaplane BaseThe display of the previous example may be fine only for
purposes of quick data review. When some type of report or cut/paste
function needs to take place, it is often best to reformat the
results. Example 6-13
uses grep to filter out all the
lines that aren’t airport names, as in the previous example. It then
uses two sed filters to remove the
attribute name information, and then to remove any airports that start
with B. As you can see, the example
runs ogrinfo results through three
filters and produces an easy-to-read list of all the airports meeting
your criteria.
> ogrinfo data airports | grep ' NAME' | sed 's/ NAME (String) = //' | sed '/^B/d'
Christenson Point Seaplane Base
Deer River Municipal Airport
Gospel Ranch Airport
Grand Rapids-Itasca County/Gordon Newstrom Field
Richter Ranch Airport
Shaughnessy Seaplane Base
Sixberrys Landing Seaplane Base
Snells Seaplane BaseThe usage of the last sed
filter looks somewhat obscure, because it uses the caret ^ symbol. This denotes the start of a line,
so, in this case, it looks for any line that starts with B. It doesn’t concern itself with the rest
of the line at all. The final /d
means “delete lines that meet the ^B criteria.”
Example 6-14 uses
a similar approach but doesn’t require the text to be at the beginning
of the line. Any airport with the word Municipal in the name is deleted from the
final list.
> ogrinfo data airports | grep ' NAME' | sed 's/ NAME (String) = //' | sed '/Municipal/d'
Bolduc Seaplane Base
Burns Lake Seaplane Base
Christenson Point Seaplane Base
Gospel Ranch Airport
Grand Rapids-Itasca County/Gordon Newstrom Field
Richter Ranch Airport
Shaughnessy Seaplane Base
Sixberrys Landing Seaplane Base
Snells Seaplane Base sed has many
different options and can be very sophisticated, especially when
combining sed filters. Example 6-15 shows how you can
string numerous commands together and do a few filters all at
once.
> ogrinfo data airports | sed -n '/^ NAME/,/^ ELEVATION/p' | sed '/LAT/d' | sed '/LON/d'
| sed 's/..................//'
Bigfork Municipal Airport
= 1343.0000
Bolduc Seaplane Base
= 1325.0000
Bowstring Municipal Airport
= 1372.0000
Burns Lake Seaplane Base
= 1357.0000
Christenson Point Seaplane Base
= 1372.0000
Deer River Municipal Airport
= 1311.0000
Gospel Ranch Airport
= 1394.0000
Grand Rapids-Itasca County/Gordon Newstrom Field
= 1355.0000
Richter Ranch Airport
= 1340.0000
Shaughnessy Seaplane Base
= 1300.0000
Sixberrys Landing Seaplane Base
= 1372.0000
Snells Seaplane Base
= 1351.0000This example uses sed to do
only four filters on the list. The first is perhaps the most complex.
It has two options separated by a comma:
'/^ NAME/,/^ ELEVATION/p'You can see the use of the caret again, which always denotes
that the filter is looking at the beginning of the line(s). In this
case it looks for the lines starting with NAME (including a couple spaces that
ogrinfo throws in by default), but
then there is also ELEVATION
specified. The comma tells sed to
include a range of lines—those that fall between the line starting
with NAME and the next line
starting with ELEVATION. NAME is called the start; ELEVATION is called the end. This way you
can see a few lines together rather than selecting one line at a time.
This is helpful because it shows the lines in the context of
surrounding information and is important for text streams that are
listed like ogrinfo output, which
groups together attributes of features onto multiple lines.
sed '/LAT/d' | sed '/LON/d'The second and third filters are simple delete filters that
remove any LAT and LON lines. Notice that these lines
originally fell between NAME and
ELEVATION in the list, so the
filter is simply removing more and more lines building on the previous
filter.
sed 's/..................//'The fourth filter isn’t a joke, nor did I fall asleep on the
keyboard. It is a substitute or search/replace filter, which is
signified by the preceding s/. Each
period represents a character that sed will delete from the beginning of each
line.
The end result of these four filters is a much more readable list of all the airports in the shape file and their respective elevations.
Another very handy command-line tool is sort. sort does just what the name promises: it
puts text or numbers in a certain order. It sorts in ascending order
by default, from smallest to highest or from lowest letter (closest to
“a”) to highest letter (closest to “z”).
In Example 6-16
all the lines are filtered out except those including ELEVATION. Unwanted letters are then
stripped from the beginning of each line. The output is then filtered
through sort which reorders the
output in ascending order.
The output from sort
includes some duplicate or repeated values. Obviously some airports
rest at the same elevation: 1,372 feet. If this output is going to be
used in a report, it may not make sense to include repeated values,
especially when it is just a list of numbers.
The uniq command can help
make the results more presentable. In Example 6-17, the results of
grep, sed, and sort were passed to the uniq command. uniq processes the list and removes
duplicate lines from the list. You’ll notice only one occurrence of
1372 now.
> ogrinfo data airports | grep 'ELEVATION' | sed -n 's/ ELEVATION (Real) = //p' | sort
| uniq
1300.0000
1311.0000
1325.0000
1340.0000
1343.0000
1351.0000
1355.0000
1357.0000
1372.0000
1394.0000uniq has some other options.
As seen in Example 6-18,
-c tells uniq to also print the number of times each
line occurs. Notice that only elevation 1372 occurred more than
once.
> ogrinfo data airports | grep 'ELEVATION' | sed -n 's/ ELEVATION (Real) = //p' | sort
| uniq -c
1 1300.0000
1 1311.0000
1 1325.0000
1 1340.0000
1 1343.0000
1 1351.0000
1 1355.0000
1 1357.0000
3 1372.0000
1 1394.0000The -d option for uniq shows only duplicate records. You can
combine multiple options to help give you exactly what you are looking
for. As shown in Example
6-19, if you are only interested in airports with the same
elevation, and you want to know how many there are, you would only
have to add d to the options for
uniq.
> ogrinfo data airports | grep 'ELEVATION' | sed -n 's/ ELEVATION (Real) = //p' | sort
| uniq -cdTo add line numbers to you output, pass your text stream through
the nl command. Example
6-20 shows what this looks like.
Keep in mind that uniq checks
each line only against surrounding lines, therefore the sort
beforehand helps make sure that all duplicates are side by side. If
they aren’t, there is no guarantee that uniq will produce the expected results.
Other text-processing commands may better suit you if you are unable to use
sort. For example, tsort , referenced in the next section may do what you
want.
Most Unix implementations, including Linux, have many
more processing commands available. The list below shows a summary of
the text-processing commands you may find useful. If you are wondering
how to use some of them, you can usually add —help after the command name to get a list
of options. Or you may also be able to read the manual for the command
by typing man < command name
>.
sortSorts lines of text
pasteMerges lines of files
sedPerforms basic text transformations on an input stream
tsortPerforms topological sort
joinJoins lines of two files on a common field
awkIs a pattern scanning and processing language
uniqRemoves duplicate lines from a sort file
head/tailOutputs the first/last part of files
wcPrints the number of newlines, words, and bytes in files
expand/unexpandConverts to/from tabs and spaces
grepPrints lines matching a pattern
columnColumnates lists
cutRemoves sections from each line of files
lookDisplays lines beginning with a given string
colrmRemoves columns from a file
nlNumbers lines in text stream
I highly recommend the Linux Documentation Project site to get a comprehensive list and examples of these text-processing commands. This is helpful for those other platforms as well because these tools often exist for other platforms. The site address is http://www.tldp.org/—search for the “Text Processing” HOWTO document. Other HOWTO documents and tutorials go into more depth for specific text processing programs. Check out the O’Reilly book Unix Power Tools for more Unix help.