Compressing data to make grid codes

Latitudes and longitudes are bulky to transmit. They have a lot of digits and some peculiar punctuation. Over the years, some alternatives have risen that abbreviate a location using a simpler notation. The essential idea is to convert latitude and longitude numbers from their degree-minute-second numbers into a sequence of letters and digits that represent the same information.

We'll look at three compression schemes: the GeoRef system, the Maindenhead Locator, and NAC. Each of these encodings involves doing some arithmetic calculations to convert numbers from decimal (base 10) to another base. We'll also use a number of string operations to translate numbers to characters and characters to numbers.

Another interesting programming issue is that these encodings don't work directly with latitudes and longitudes. The problem with simply using latitudes and longitudes is that they're signed numbers: -90 (S) to +90 (N) and -180 (W) to +180 (E). Also, longitudes have a bigger range (360 values), whereas latitudes have a smaller range (180 values). To simplify the encoding, we'll apply a common programming hack: we'll offset and scale the values. We'll see a number of ways to apply this clever technique.

In effect, the scaling and offsetting moves the map's (0, 0) origin to somewhere in Antarctica: at the south pole and right on 180° longitude. The center of these grid maps is somewhere off the coast of West Africa, and the upper-right corner will wind up in the Bering Sea, right at the north pole and next to the 180° longitude.

Creating GeoRef codes

The GeoRef system compresses a latitude-longitude position using four letters and as many as eight digits. This system can also be used to encode descriptions of regions as well as altitudes. We'll stick with locations on the surface.

For some background information, see http://en.wikipedia.org/wiki/Georef.

This system encodes decimal numbers using 24-letter codes chosen from A to Z, omitting I and O. This means that we can't simply rely on a handy copy of the alphabet such as string.ascii_uppercase to provide the letter codes. We'll have to define our own GeoRef letters. We can compute the letters with an expression as follows:

>>> string.ascii_uppercase.replace("O","").replace("I","")
'ABCDEFGHJKLMNPQRSTUVWXYZ'

The GeoRef codes slice the world map into a 12 x 24 grid of 15° x 15° quadrangles. The latitude is measured in positive numbers from the South Pole. The longitude is measured in positive numbers from the International Date Line. When we divide 180° of latitude into 15° steps, we can encode a part of this three-digit number using 12 letters from A to M (omitting I). When we divide 360° of longitude into 15° steps, we can encode a part of this three-digit number using 24 letters from A to Z (omitting I and O).

We can then divide each 15° quadrangles into 15 bands using letters A to Q (again, skipping I and O). This creates a four- character code for the entire degrees portion of a latitude and longitude position.

If we had a latitude of 38°17′10″N, we'd offset this to be 128° north of the south pole and divide it by 15°:

>>> divmod(38+90,15)
(8, 8)

These values are encoded as J and J.

A longitude of 76°24′42″W is encoded as shown in the following code. This is -76.41167°, which we offset by 180° before using divmod to calculate the two characters:

>>> divmod( -76+180, 15 )
(6, 14)

This gives us letters G and P. We interleave longitude and latitude characters so that the whole string is GJPJ. We've encoded six digits of latitude and longitude into four characters.

The leftover minutes and seconds can be encoded as two, three, or four digits. For the latitude, 17′10″ can be encoded as 17.16 minutes. This is 17, an intermediate 171, or a detailed 1716.

Here's the entire encoder the for GeoRef codes:

def ll_2_georef( lat, lon ):
    f_lat, f_lon = lat+90, lon+180
    lat_0, lat_1 = divmod( int(f_lat), 15 )
    lon_0, lon_1 = divmod( int(f_lon), 15 )
    lat_m, lon_m = 6000*(f_lat-int(f_lat)), 6000*(f_lon-int(f_lon))
    return "{lon_0}{lat_0}{lon_1}{lat_1}{lon_m:04d}{lat_m:04d}".format(
        lon_0= georef_uppercase[lon_0],
        lat_0= georef_uppercase[lat_0],
        lon_1= georef_uppercase[lon_1],
        lat_1= georef_uppercase[lat_1],
        lon_m= int(lon_m),
        lat_m= int(lat_m),
    ) 

We offset the latitudes and longitudes so that we don't have to deal with signed numbers. We used the divmod() function to divide by 15° and get both a quotient and a remainder. We can then use our georef_uppercase letters to translate the numeric quotients and remainders into expected character codes.

The fractional values, for example, f_lat-int(f_lat), are scaled by 6000 to create a number between 0000 and 5999, which is simply the number of minutes in 100ths.

We've used the string format() method to assemble the four-character codes and four-digit numeric codes into a single string. The first two letters are longitude and latitude to provide a position to the nearest 15°. The next two letters have more longitude and latitude details to refine this to the nearest 1°. The digits are in two blocks of four to provide the detailed minutes.

Here's a more complete example of the output. We'll encode 36°50.63′N 076°17.49′W:

lat, lon = 36+50.63/60, -(76+17.49/60)
print(lat, lon)
print(ll_2_georef(lat, lon))

We've converted degrees and minutes to degrees. Then, we applied our GeoRef conversion to the values in degrees. Here's what the output looks like:

36.843833333333336 -76.2915
GJPG42515063

The code GJPG is an approximation of the given location; it could be off by almost 80 nautical miles at the equator. The error gets smaller toward the poles. The code GJPG4250 uses the two-digit encoding of whole minutes to get within a few miles of the coordinates.