Because we created an isolated instance of your postgres database, we have to recreate to use, database name and schema. These operations are optional. However, we encourage you to follow this to make this recipe consistent with the rest of the book:
- Create the user me in your container:
root@d842288536c9:/# psql -U postgres
psql (10.1)
Type "help" for help.
postgres=# CREATE USER me WITH PASSWORD 'me';
CREATE ROLE
postgres=# ALTER USER me WITH SUPERUSER;
ALTER ROLE
- Reconnect to the database but now as user me to create database and schema:
root@d842288536c9:/# PGPASSWORD=me psql -U me -d postgres
postgres=# CREATE DATABASE "postgis-cookbook";
CREATE DATABASE
postgres=# \c postgis-cookbook
You are now connected to database postgis-cookbook as user me:
postgis-cookbook=# CREATE SCHEMA chp10;
CREATE SCHEMA
postgis-cookbook=# CREATE EXTENSION postgis;
CREATE EXTENSION
- Insert a layer into the database. In this case, we will make use of the gis.osm_buildings_a_free_1 shapefile from Colombia. Make sure you have these files within the SHP_PATH before starting the container. This database insertion could be run in two forms: First one is inside your docker container:
root@d842288536c9:/# /usr/lib/postgresql/10/bin/shp2pgsql -s 3734
-W latin1 /data/gis.osm_buildings_a_free_1.shp chp10.buildings |
PGPASSWORD=me psql -U me -h localhost -p 5432 -d postgis-cookbook
The second option is in your host computer. Make sure to correctly set your shapefiles path and host port that maps to the 5432 container port. Also, your host must have postgresql-client installed:
$ shp2pgsql -s 3734 -W latin1 <SHP_PATH>
/gis.osm_buildings_a_free_1.shp chp10.buildings | PGPASSWORD=me
psql -U me -h localhost -p 5433 -d postgis-cookbook
- Execute parallel query. Using the building table we can execute a postgis command in parallel. To check how many workers are created, we make use of the EXPLAIN ANALYZE command. So, for example, if we want to calculate the sum of all geometries from the table in a serial query:
postgis-cookbook=# EXPLAIN ANALYZE SELECT Sum(ST_Area(geom))
FROM chp10.buildings;
We get the following result:
Aggregate (cost=35490.10..35490.11 rows=1 width=8)
(actual time=319.299..319.2 99 rows=1 loops=1)
-> Seq Scan on buildings (cost=0.00..19776.16 rows=571416 width=142)
(actual time=0.017..68.961 rows=571416 loops=1)
Planning time: 0.088 ms
Execution time: 319.358 ms
(4 rows)
Now, if we modify the max_parallel_workers and max_parallel_workers_per_gather parameters, we activate the parallel query capability of PostgreSQL:
Aggregate (cost=35490.10..35490.11 rows=1 width=8)
(actual time=319.299..319.299 rows=1 loops=1)
-> Seq Scan on buildings (cost=0.00..19776.16 rows=571416 width=142)
(actual time=0.017..68.961 rows=571416 loops=1)
Planning time: 0.088 ms
Execution time: 319.358 ms
(4 rows)
This command prints in Terminal:
Finalize Aggregate (cost=21974.61..21974.62 rows=1 width=8)
(actual time=232.081..232.081 rows=1 loops=1)
-> Gather (cost=21974.30..21974.61 rows=3 width=8)
(actual time=232.074..232.078 rows=4 loops=1)
Workers Planned: 3
Workers Launched: 3
-> Partial Aggregate (cost=20974.30..20974.31 rows=1 width=8)
(actual time=151.785..151.785 rows=1 loops=4)
-> Parallel Seq Scan on buildings
(cost=0.00..15905.28 rows=184328 width=142)
(actual time=0.017..58.480 rows=142854 loops=4)
Planning time: 0.086 ms
Execution time: 239.393 ms
(8 rows)
- Execute parallel scans. For example, if we want to select polygons whose area is higher than a value:
postgis-cookbook=# EXPLAIN ANALYZE SELECT * FROM chp10.buildings
WHERE ST_Area(geom) > 10000;
We get the following result:
Seq Scan on buildings (cost=0.00..35490.10 rows=190472 width=190)
(actual time=270.904..270.904 rows=0 loops=1)
Filter: (st_area(geom) > '10000'::double precision)
Rows Removed by Filter: 571416
Planning time: 0.279 ms
Execution time: 270.937 ms
(5 rows)
This query is not executed in parallel. This happens because ST_Area function is defined with a COST value of 10. A COST for PostgreSQL is a positive number giving the estimated execution cost for a function. If we increase this value to 100, we can get a parallel plan:
postgis-cookbook=# ALTER FUNCTION ST_Area(geometry) COST 100;
postgis-cookbook=# EXPLAIN ANALYZE SELECT * FROM chp10.buildings
WHERE ST_Area(geom) > 10000;
Now we have a parallel plan and 3 workers are executing the query:
Gather (cost=1000.00..82495.23 rows=190472 width=190)
(actual time=189.748..189.748 rows=0 loops=1)
Workers Planned: 3
Workers Launched: 3
-> Parallel Seq Scan on buildings
(cost=0.00..62448.03 rows=61443 width=190)
(actual time=130.117..130.117 rows=0 loops=4)
Filter: (st_area(geom) > '10000'::double precision)
Rows Removed by Filter: 142854
Planning time: 0.165 ms
Execution time: 190.300 ms
(8 rows)
- Execute parallel joins. First, we create a point table where we create randomly 10 points per polygon:
postgis-cookbook=# DROP TABLE IF EXISTS chp10.pts_10;
postgis-cookbook=# CREATE TABLE chp10.pts_10 AS
SELECT (ST_Dump(ST_GeneratePoints(geom, 10))).geom
::Geometry(point, 3734) AS geom,
gid, osm_id, code, fclass, name, type FROM chp10.buildings;
postgis-cookbook=# CREATE INDEX pts_10_gix
ON chp10.pts_10 USING GIST (geom);
Now, we can run a table join between two tables, which does not give us a parallel plan:
Nested Loop (cost=0.41..89034428.58 rows=15293156466 width=269)
-> Seq Scan on buildings (cost=0.00..19776.16 rows=571416 width=190)
-> Index Scan using pts_10_gix on pts_10
(cost=0.41..153.88 rows=190 width=79)
Index Cond: (buildings.geom && geom)
Filter: _st_intersects(buildings.geom, geom)
For this case, we need to modify the parameter parallel_tuple_cost which sets the planner's estimate of the cost of transferring one tuple from a parallel worker process to another process. Setting the value to 0.001 gives us a parallel plan:
Nested Loop (cost=0.41..89034428.58 rows=15293156466 width=269)
-> Seq Scan on buildings (cost=0.00..19776.16 rows=571416 width=190)
-> Index Scan using pts_10_gix on pts_10
(cost=0.41..153.88 rows=190 width=79)
Index Cond: (buildings.geom && geom)
Filter: _st_intersects(buildings.geom, geom)