Table of Contents for
Practical GIS

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Practical GIS by Gábor Farkas Published by Packt Publishing, 2017
  1. Practical GIS
  2. Title Page
  3. Copyright
  4. Credits
  5. About the Author
  6. About the Reviewer
  7. www.PacktPub.com
  8. Customer Feedback
  9. Dedication
  10. Table of Contents
  11. Preface
  12. What this book covers
  13. What you need for this book
  14. Who this book is for
  15. Conventions
  16. Reader feedback
  17. Customer support
  18. Downloading the example code
  19. Downloading the color images of this book
  20. Errata
  21. Piracy
  22. Questions
  23. Setting Up Your Environment
  24. Understanding GIS
  25. Setting up the tools
  26. Installing on Linux
  27. Installing on Windows
  28. Installing on macOS
  29. Getting familiar with the software
  30. About the software licenses
  31. Collecting some data
  32. Getting basic data
  33. Licenses
  34. Accessing satellite data
  35. Active remote sensing
  36. Passive remote sensing
  37. Licenses
  38. Using OpenStreetMap
  39. OpenStreetMap license
  40. Summary
  41. Accessing GIS Data With QGIS
  42. Accessing raster data
  43. Raster data model
  44. Rasters are boring
  45. Accessing vector data
  46. Vector data model
  47. Vector topology - the right way
  48. Opening tabular layers
  49. Understanding map scales
  50. Summary
  51. Using Vector Data Effectively
  52. Using the attribute table
  53. SQL in GIS
  54. Selecting features in QGIS
  55. Preparing our data
  56. Writing basic queries
  57. Filtering layers
  58. Spatial querying
  59. Writing advanced queries
  60. Modifying the attribute table
  61. Removing columns
  62. Joining tables
  63. Spatial joins
  64. Adding attribute data
  65. Understanding data providers
  66. Summary
  67. Creating Digital Maps
  68. Styling our data
  69. Styling raster data
  70. Styling vector data
  71. Mapping with categories
  72. Graduated mapping
  73. Understanding projections
  74. Plate Carrée - a simple example
  75. Going local with NAD83 / Conus Albers
  76. Choosing the right projection
  77. Preparing a map
  78. Rule-based styling
  79. Adding labels
  80. Creating additional thematics
  81. Creating a map
  82. Adding cartographic elements
  83. Summary
  84. Exporting Your Data
  85. Creating a printable map
  86. Clipping features
  87. Creating a background
  88. Removing dangling segments
  89. Exporting the map
  90. A good way for post-processing - SVG
  91. Sharing raw data
  92. Vector data exchange formats
  93. Shapefile
  94. WKT and WKB
  95. Markup languages
  96. GeoJSON
  97. Raster data exchange formats
  98. GeoTIFF
  99. Clipping rasters
  100. Other raster formats
  101. Summary
  102. Feeding a PostGIS Database
  103. A brief overview of databases
  104. Relational databases
  105. NoSQL databases
  106. Spatial databases
  107. Importing layers into PostGIS
  108. Importing vector data
  109. Spatial indexing
  110. Importing raster data
  111. Visualizing PostGIS layers in QGIS
  112. Basic PostGIS queries
  113. Summary
  114. A PostGIS Overview
  115. Customizing the database
  116. Securing our database
  117. Constraining tables
  118. Saving queries
  119. Optimizing queries
  120. Backing up our data
  121. Creating static backups
  122. Continuous archiving
  123. Summary
  124. Spatial Analysis in QGIS
  125. Preparing the workspace
  126. Laying down the rules
  127. Vector analysis
  128. Proximity analysis
  129. Understanding the overlay tools
  130. Towards some neighborhood analysis
  131. Building your models
  132. Using digital elevation models
  133. Filtering based on aspect
  134. Calculating walking times
  135. Summary
  136. Spatial Analysis on Steroids - Using PostGIS
  137. Delimiting quiet houses
  138. Proximity analysis in PostGIS
  139. Precision problems of buffering
  140. Querying distances effectively
  141. Saving the results
  142. Matching the rest of the criteria
  143. Counting nearby points
  144. Querying rasters
  145. Summary
  146. A Typical GIS Problem
  147. Outlining the problem
  148. Raster analysis
  149. Multi-criteria evaluation
  150. Creating the constraint mask
  151. Using fuzzy techniques in GIS
  152. Proximity analysis with rasters
  153. Fuzzifying crisp data
  154. Aggregating the results
  155. Calculating statistics
  156. Vectorizing suitable areas
  157. Using zonal statistics
  158. Accessing vector statistics
  159. Creating an atlas
  160. Summary
  161. Showcasing Your Data
  162. Spatial data on the web
  163. Understanding the basics of the web
  164. Spatial servers
  165. Using QGIS for publishing
  166. Using GeoServer
  167. General configuration
  168. GeoServer architecture
  169. Adding spatial data
  170. Tiling your maps
  171. Summary
  172. Styling Your Data in GeoServer
  173. Managing styles
  174. Writing SLD styles
  175. Styling vector layers
  176. Styling waters
  177. Styling polygons
  178. Creating labels
  179. Styling raster layers
  180. Using CSS in GeoServer
  181. Styling layers with CSS
  182. Creating complex styles
  183. Styling raster layers
  184. Summary
  185. Creating a Web Map
  186. Understanding the client side of the Web
  187. Creating a web page
  188. Writing HTML code
  189. Styling the elements
  190. Scripting your web page
  191. Creating web maps with Leaflet
  192. Creating a simple map
  193. Compositing layers
  194. Working with Leaflet plugins
  195. Loading raw vector data
  196. Styling vectors in Leaflet
  197. Annotating attributes with popups
  198. Using other projections
  199. Summary
  200. Appendix

Continuous archiving

In some setups, it is simply inconvenient to save static backups of a database on regular intervals. With continuous archiving, we can archive the changes made to our database, and roll back to a previous stable state on failure or corruption. With this archiving method, PostgreSQL automatically saves logs in a binary format to a destination location, and can restore the whole database from those logs if necessary. The main disadvantage of this method is that the whole cluster is saved, and there is no way to specify which parts we would like to archive.

First of all, what is a cluster? In PostgreSQL terms, a cluster contains every data stored in a PostgreSQL installation. A cluster can contain multiple databases containing multiple schemas with multiple tables. Using continuous archiving is crucial in production servers where corruption or data loss is a real threat, and the ability to roll back to a previous state is required.

First of all, let's find out where our PostgreSQL cluster is located on the disk. The default path is different on different operating systems, and, besides that, we can specify a custom location for our cluster. For example, as I use an SSD for the OS, and PostgreSQL would store its database on the SSD by default, I specified the postgres folder in a partition of my HDD mounted at /database for the database. We can see the path to our cluster by running the following query:

    SHOW data_directory;

If we open the path we got from the previous query in a file manager, we will see the files and folders our PostgreSQL cluster consists of. From the folders located there, the pg_xlog contains the WALs (Write Ahead Logs) of our database transactions. WALs are part of PostgreSQL's ACID implementation, as it can restore the last stable state from these logs if something bad happens. They can be also used for continuous archiving by saving them before PostgreSQL recycles them:

If you cannot access the cluster with a regular user, it is completely fine. The cluster files should be read and written by the postgres user, while other users shouldn't have any permissions (0700 mode). If this is not the case, PostgreSQL won't start correctly.

To use continuous archiving, we need a base version of our cluster. This base version is the first checkpoint. From this checkpoint, logs are preserved, and we can restore previous states by restoring the first checkpoint, specifying a date, and letting PostgreSQL replay the logged transactions until the specified date. To enable WAL archiving, we have to set some system variables using a superuser role as follows:

  1. Set the wal_level variable to archive with the expression ALTER SYSTEM SET wal_level = 'archive';.
  2. Set the archive_mode variable to on with the expression ALTER SYSTEM SET archive_mode = 'on';.
  3. Create a place for your archives. Remember the absolute path to that place. I will use the /home/debian/postgres_archive path.
  4. Set the archive_command variable to the system call that PostgreSQL should archive WALs with. On Linux and macOS systems, it can be ALTER SYSTEM SET archive_command = 'test ! -f /home/debian/postgres_archive/%f && cp %p /home/debian/postgres_archive/%f';, while, on Windows, it should be something like ALTER SYSTEM SET archive_command = 'copy "%p" "C:\\postgres_archive\\%f"';. In the call, %f denotes the WAL file's name, while %p denotes its absolute path with its name.
  5. Restart the server.
Telling PostgreSQL what to do in the archiving process might seem tedious, but it gives an amazing amount of flexibility. We can encrypt or compress the WALs, send them through SSH, or do virtually any valid operations on them.

Next, we have to set up the first checkpoint, and create a physical copy of this base version. We can put this backup wherever we like, although it should be placed somewhere along the WAL files.

  1. Start creating the first checkpoint with the query SELECT pg_start_backup('backup', true);. By specifying true, we ask PostgreSQL to create the checkpoint as soon as possible. Without it, creating the checkpoint takes up about 2.5 minutes with the default settings. Wait for the query to finish.
  1. Copy out everything from the cluster to the backup folder. You can use any tool for this, although you must make sure that file permissions remain the same. On Linux and macOS, tar is a great tool for this. With my paths, the command looks like the following:
      tar -czvf /home/debian/postgres_archive/basebackup.tar.gz
/database/postgres.
  1. Stop the backup mode with the query SELECT pg_stop_backup();.
There is a CLI tool called pg_basebackup, which can automatically create the first checkpoint and its backup. However, it needs PostgreSQL to be configured in a way that replication connections are accepted. For further reference, you can read the official manual at https://www.postgresql.org/docs/9.4/static/app-pgbasebackup.html. You can also read a thorough guide on configuring a hot standby server at https://cloud.google.com/solutions/setup-postgres-hot-standby.

Let's say the worst has happened, and our database is corrupted. In that case, our first task is to find out the last time our database was stable. We can guess, but in this case, guessing is a bad practice. We should look through our logs to see when our database went off. We don’t want to have to rollback more transactions than necessary, as this can have a significant impact on a production server. When we have a date, we can start recovering with PostgreSQL's PITR (Point-in-Time Recovery) technique:

  1. Shut down the PostgreSQL server.
  2. Make a backup copy from the corrupted cluster's pg_xlog folder, as it might contain WAL files which haven't been archived yet. It is a good practice to make a copy of the corrupted cluster for later analysis if you have the required free disk space.
  3. Delete the cluster, and replace it with the base backup's content.
  4. The base backup's pg_xlog folder's content is now obsolete, as those changes were already incorporated in the backup database. Replace its content with the corrupted cluster's logs. Watch out for keeping the correct permissions!
  1. Create a file named recovery.conf in the cluster. The file's content must contain the inverse of the archiving command saved to the restore_command variable. It should also contain the date until the recovery should proceed to be saved to the recovery_target_time variable:
        restore_command = 'cp /home/debian/postgres_archive/%f %p'
recovery_target_time = '2017-02-21 13:00:00 GMT'
You can read more about the valid date formats PostgreSQL accepts at https://www.postgresql.org/docs/9.4/static/datatype-datetime.html.
  1. Start the server. When PostgreSQL is done with the recovery, it will rename recovery.conf to recovery.done.