Seven Databases in Seven Weeks, 2nd Edition

Wrap-Up

If you haven’t spent much time with relational databases, we highly recommend digging deeper into PostgreSQL, or another relational database, before deciding to scrap it for a newer variety. Relational databases have been the focus of intense academic research and industrial improvements for more than forty years, and PostgreSQL is one of the top open source relational databases to benefit from these advancements.

PostgreSQL’s Strengths

PostgreSQL’s strengths are as numerous as any relational model: years of research and production use across nearly every field of computing, flexible queryability, and very consistent and durable data. Most programming languages have battle-tested driver support for Postgres, and many programming models, like object-relational mapping (ORM), assume an underlying relational database.

But the real crux of the matter is the flexibility of the join. You don’t need to know how you plan to actually query your model because you can always perform some joins, filters, views, and indexes—odds are good that you will always have the ability to extract the data you want. In the other chapters of this book that assumption will more or less fly out the window.

PostgreSQL is fantastic for what we call “Stepford data” (named for The Stepford Wives, a story about a neighborhood where nearly everyone was consistent in style and substance), which is data that is fairly homogeneous and conforms well to a structured schema.

Furthermore, PostgreSQL goes beyond the normal open source RDBMS offerings, such as powerful schema constraint mechanisms. You can write your own language extensions, customize indexes, create custom datatypes, and even overwrite the parsing of incoming queries. And where other open source databases may have complex licensing agreements, PostgreSQL is open source in its purest form. No one owns the code. Anyone can do pretty much anything they want with the project (other than hold authors liable). The development and distribution are completely community supported. If you are a fan of free(dom) software, you have to respect their general resistance to cashing in on an amazing product.

PostgreSQL’s Weaknesses

Although relational databases have been undeniably the most successful style of database over the years, there are cases where it may not be a great fit.

Although we won‘t delve too deeply into it here given that we cover two other databases in this book that were explicitly created to handle unstructured data, we‘d be remiss in not mentioning that Postgres has offered support for JSON since version 9.3. Postgres offers two different formats for this: JSON and JSONB (the json and jsonb types, respectively). The crucial difference between them is that the json type stores JSON as text while jsonb stores JSON using a decomposed binary format; json is optimized for faster data input while jsonb is optimized for faster processing.

With Postgres, you can perform operations like this:

	CREATE TABLE users (
	username TEXT,
	data JSON
	);
	INSERT INTO users VALUES ('wadeboggs107', '{ "AVG": 0.328, "HR": 118, "H": 3010 }');
	SELECT data->>'AVG' AS lifetime_batting_average FROM users;

	lifetime_batting_average
	--------------------------
	0.328

If your use case requires a mixture of structured and unstructured (or less structured) datatypes—or even requires only unstructured datatypes—then Postgres may provide a solution.

Partitioning is not one of the strengths of relational databases such as PostgreSQL. If you need to scale out rather than up—multiple parallel databases rather than a single beefy machine or cluster—you may be better served looking elsewhere (although clustering capabilities have improved in recent releases). Another database might be a better fit if:

You don’t truly require the overhead of a full database (perhaps you only need a cache like Redis).
You require very high-volume reads and writes as key values.
You need to store only large BLOBs of data.

Parting Thoughts

A relational database is an excellent choice for query flexibility. While PostgreSQL requires you to design your data up front, it makes no assumptions about how you use that data. As long as your schema is designed in a fairly normalized way, without duplication or storage of computable values, you should generally be all set for any queries you might need to create. And if you include the correct modules, tune your engine, and index well, it will perform amazingly well for multiple terabytes of data with very small resource consumption. Finally, to those for whom data safety is paramount, PostgreSQL’s ACID-compliant transactions ensure your commits are completely atomic, consistent, isolated, and durable.

Footnotes

[5]: http://www.postgresql.org/download/
[6]: http://www.postgresql.org/docs/9.0/static/plpgsql.html
[7]: http://www.postgresql.org/docs/9.0/static/app-createlang.html
[8]: http://www.postgresql.org/docs/9.0/static/triggers.html
[9]: http://www.postgresql.org/docs/current/static/contrib.html
[10]: http://lucene.apache.org/
[11]: https://www.elastic.co/products/elasticsearch

Previous Chapter

Day 3: Full Text and Multidimensions

Next Chapter

3. HBase

Table of Contents for Seven Databases in Seven Weeks, 2nd Edition

Wrap-Up

PostgreSQL’s Strengths

PostgreSQL’s Weaknesses

Parting Thoughts

Footnotes

Table of Contents for
Seven Databases in Seven Weeks, 2nd Edition