Now, let's turn to OGR for reading and writing a vector so that you can compare both OGR and GeoPandas functionality for performing the same kind of tasks. To follow the instructions that are mentioned as we proceed, you can download the MTBS wildfire data from: https://edcintl.cr.usgs.gov/downloads/sciweb1/shared/MTBS_Fire/data/composite_data/fod_pt_shapefile/mtbs_fod_pts_data.zip and store them on your PC. The file that will be analyzed here is the mtbs_fod_pts_20170501 shapefile's attribute table, which has 20,340 rows and 30 columns.
We'll start with the ogrinfo command which works in a terminal window and can be used for describing vector data. These are not Python commands, but we'll include them here as you can easily run them in a Jupyter Notebook with a simple prefix (adding an exclamation mark before the used command). Take, for instance, the following command, which is similar to the Fiona driver command:
In: !ogrinfo –-formats
This command lists the available formats that ogrinfo can access, by using the general option --formats. The results also tells us whether GDAL/OGR can only read/open the format, or whether it can also write new layers in that format. As you can see from the output, there are many supported file formats with OGR. Looking at Esri shapefiles in the list, the addition of (rw+v) means OGR supports read, write, update (meaning create), and virtual formats for Esri shapefiles:
In: !ogrinfo -so "pts" mtbs_fod_pts_20170501
The previous command lists the summary information about all the layers in a data source, which in this case is all the shapefiles in the folder called "pts". The addition of -so stands for summary option. You can see that this command lists similar information as we saw with GeoPandas, such as the CRS. The same line of code, but without the -so addition will print all features and attributes, and takes some time to process. This is comparable to creating a GeoDataFrame in GeoPandas, but all attribute info is printed per feature on a new line instead of preserving the tabular form:
In: !ogrinfo "pts" mtbs_fod_pts_20170501
If we want to convert this shapefile into a GeoJSON file, we will use the following command:
In: !ogr2ogr -f "GeoJSON" "C:\data\output.json"
"C:\data\mtbs_fod_pts_data\mtbs_fod_pts_20170501.shp"
The -f prefix stands for the format, followed by the output driver name, the output file name, and the location and the input file. You might receive error warnings doing file conversions, for example when a bad feature is encountered, but an output file will be written anyway.
OGR has also read and write capabilities for KML files. Download this KML sample file (https://developers.google.com/kml/documentation/KML_Samples.kml) with the following code and run the following code to read its contents:
In: !ogrinfo "C:\Users\UserName\Downloads\KML_Samples.kml" -summary
Out: Had to open data source read-only.INFO: Open of
`C:\Users\UserName\Downloads\KML_Samples.kml' using driver
`KML' successful.
1: Placemarks (3D Point)
2: Highlighted Icon (3D Point)
3: Paths (3D Line String)
4: Google Campus (3D Polygon)
5: Extruded Polygon (3D Polygon)
6: Absolute and Relative (3D Polygon)
For a more Pythonic approach for OGR, let's see some examples of how you can read and write data with OGR.
The following code lists all 30 field names of our wildfire shapefile using OGR:
In: from osgeo import ogr
source = ogr.Open(r"C:\data\mtbs_fod_pts_data\
mtbs_fod_pts_20170501.shp")
layer = source.GetLayer()
schema = []
ldefn = layer.GetLayerDefn()
for n in range(ldefn.GetFieldCount()):
fdefn = ldefn.GetFieldDefn(n)
schema.append(fdefn.name)
print(schema)
Out: ['FIRE_ID', 'FIRENAME', 'ASMNT_TYPE', 'PRE_ID', 'POST_ID', 'ND_T',
'IG_T', 'LOW_T',
'MOD_T', 'HIGH_T', 'FIRE_YEAR', 'FIRE_MON', 'FIRE_DAY', 'LAT',
'LONG', 'WRS_PATH',
'WRS_ROW', 'P_ACRES', 'R_ACRES', 'STATE', 'ADMIN', 'MTBS_ZONE',
'GACC',
'HUC4_CODE','HUC4_NAME', 'Version', 'RevCode', 'RelDate',
'Fire_Type']
As you can see from the preceding code, this is a little more involved than using GeoPandas, where you can directly load all attribute data into one GeoDataFrame using little code. With OGR, you need to iterate over the individual features which need to be referenced from a layer definition and appended to an empty list. But first, you need to use the GetLayer function— this is because OGR has its own data model that does not adapt itself automatically to the file format it reads.
Now that we have all the field names, we can iterate over the individual features, for instance, for the state field:
In: from osgeo import ogr
import os
shapefile = r"C:\data\mtbs_fod_pts_data\mtbs_fod_pts_20170501.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(shapefile, 0)
layer = dataSource.GetLayer()
for feature in layer:
print(feature.GetField("STATE"))
Judging from the output of the last cell, there are apparently many features, but exactly how many? The total feature count can be printed as follows:
In: import os
from osgeo import ogr
daShapefile = r"C:\data\mtbs_fod_pts_data\
mtbs_fod_pts_20170501.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(daShapefile, 0)
layer = dataSource.GetLayer()
featureCount = layer.GetFeatureCount()
print("Number of features in %s: %d" %
(os.path.basename(daShapefile), featureCount))
Out: Number of features in mtbs_fod_pts_20170501.shp: 20340
As we've seen previously, the CRS is essential information about your spatial data. You can print this information in two ways—from the layer and the geometry of the layer. In the following code, two spatial reference variables will print the same output, as it should be (only the output of the first option is listed here to save space):
In: from osgeo import ogr, osr
driver = ogr.GetDriverByName('ESRI Shapefile')
dataset = driver.Open(r"C:\data\mtbs_fod_pts_data\
mtbs_fod_pts_20170501.shp")
# Option 1: from Layer
layer = dataset.GetLayer()
spatialRef = layer.GetSpatialRef()
print(spatialRef)
# Option 2: from Geometry
feature = layer.GetNextFeature()
geom = feature.GetGeometryRef()
spatialRef2 = geom.GetSpatialReference()
print(spatialRef2)
Out: GEOGCS["GCS_North_American_1983",
DATUM["North_American_Datum_1983",
SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]]
We can check if we're dealing with points and print the x and y values of all individual features as well as their centroids as follows:
In: from osgeo import ogr
import os
shapefile = r"C:\data\mtbs_fod_pts_data\mtbs_fod_pts_20170501.shp"
driver = ogr.GetDriverByName("ESRI Shapefile")
dataSource = driver.Open(shapefile, 0)
layer = dataSource.GetLayer()
for feature in layer:
geom = feature.GetGeometryRef()
print(geom.Centroid().ExportToWkt())