Chapter 4. Drops, Hideouts, Meetups, and Lairs

We'll extend some of the techniques introduced in Chapter 2, Acquiring Intelligence Data, to make RESTful web service requests for geocoding. This will allow us to pinpoint various kinds of secret locations. This will also build on image location processing from Chapter 3, Encoding Secret Messages with Steganography.

We will look at some online datasets that will lead us to more techniques in data gathering. In order to work with a wide variety of data, we will need to add an HTML parser to our toolkit. We'll download BeautifulSoup, since it's very good at tracking down the information buried in HTML pages.

In this chapter, we'll also look at some more sophisticated Python algorithms. We'll start with geocoding services to translate address and latitude-longitude coordinates.

We'll look at the haversine formula to compute distances between locations. This will mean using the math library to access trigonometric functions.

We'll learn about the various kinds of grid coding schemes, which will help us reduce the complexity of latitude and longitude. These coding schemes will show us a number of data representation techniques. This chapter will show ways to compress numbers via a change in representation.

We'll see ways to parse HTML <table> tags and create Python collections that we can work with. We'll also look at online data sources that provide clean data in the JSON format. This can be easier to gather and work with.

Our goal is to use Python to combine multiple online services. This will allow us to integrate geocoding and data analysis. With that information, we can locate the best place to meet our contacts without traveling too far from our secret base of operations.

Background briefing – latitude, longitude, and GPS

Before we can get geographic information, we'll need to review some essential terminology. One powerful piece of modern technology that helps civilians as well as secret agents is the Global Positioning System (GPS), a satellite-based system to determine location. The GPS allows a terrestrial device to pinpoint its location in both space and time.

The idea underlying GPS is quite elegant. Each satellite produces a stream of data that includes position and super accurate timestamps. A receiver with multiple streams of data can plug the positions and timestamps into a matrix of simultaneous equations to determine the receiver's position with respect to the various satellites. Given enough satellites, a receiver can precisely calculate latitude, longitude, elevation, and even the current time.

For more information see http://en.wikipedia.org/wiki/Global_Positioning_System#Navigation_equations.

A position's latitude is an angle measured relative to the equator and poles. We must provide the direction for this angle: N or S for latitude. For example, 36°50′40.12″N is given in degrees (°), minutes (′), and seconds (″) with the all-important N to show which side of the equator the position is on.

We can also state latitude as 36°50.6687′N using degrees and minutes; or, we could use 36.844478, known as using decimal degrees. Directions toward the north are written in positive angles. Directions to the south are negative. The underlying math library works in radians, but radians are not widely used to display positions to humans.

Longitude is an angle east of the prime meridian or the Greenwich meridian. Angles to the west of Greenwich are stated as negative numbers. Consequently, 76°17′35.21″W can also be stated as -76.293114.

When we look at a globe, we notice that the latitude lines are all parallel with the equator. Each degree of latitude is about 60 nautical miles in the north-south direction.

The longitude lines, however, all intersect at the north and south pole. Those north-south lines are not parallel. On a map or a nautical chart, however, a distortion (actually known as a projection) is used so that the longitude lines are parallel to each other. With our usual experience of driving short distances on land, the distortion doesn't matter much, since we're often constrained to driving on highways that wander around rivers and mountains. What's important is that the rectangular grid of a map is handy, but misleading. Simple analytical plane geometry isn't appropriate. Hence, we have to switch to spherical geometry.

Coping with GPS device limitations

A GPS receiver needs to receive data from a number of satellites concurrently; a minimum of three satellites can be used for triangulation. There may be interference with microwave signals indoors, and even outdoors in urban environments, making it difficult (or impossible) to get enough data to properly compute the receiver's location. Tall buildings and other obstructions such as walls, prevent the direct signal access needed for accuracy. It may take a long time to acquire enough high-quality satellite signals to compute a position.

A common workaround to the satellite visibility problem is to rely on cellular telephone towers as a way to compute a position very quickly even without GPS satellite data. A phone which is in contact with several cell towers can have the position triangulated based on the overlapping transmission patterns. In many telephone devices, the GPS calculation requires local cellphone towers before it can calculate a GPS position.

There are many non-phone GPS devices that can be directly connected to a computer to get accurate GPS fixes without relying on cellular data. Navigation computers (mariners call them chart plotters) work without the need to connect to a cellular network. In many cases, we can use modules such as pyserial to extract data from these devices.

See http://pyserial.sourceforge.net for more information on the pySerial project and how we can use this read data from a GPS device via a serial to a USB adapter.

Handling politics – borders, precincts, jurisdictions, and neighborhoods

Borders create endless problems—some profound, some subtle. The entire sweep of human history seems to center on borders and wars. The edges of neighborhoods are often subjective. In an urban environment, a block or two may not matter much when discussing the difference between Los Feliz and East Hollywood. On the other hand, this kind of knowledge is what defines the local restaurants as recognized by people who live there.

When it comes to more formal definitions—such as election districts at city, state, and federal levels—the side of the street may have profound implications. In some cities, this political division information is readily available via RESTful web service requests. In other locales, this information is buried in a drawer somewhere, or published in some kind of hard-to-process PDF document.

Some media companies provide neighborhood information. The LA Times Data Desk, for example, has a fairly rigorous definition of the various neighborhoods around the greater Los Angeles area. For excellent background information on how to work with this kind of information, see http://www.latimes.com/local/datadesk/.