Table of Contents for
Seven Databases in Seven Weeks, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Seven Databases in Seven Weeks, 2nd Edition by Jim Wilson Published by Pragmatic Bookshelf, 2018
  1. Title Page
  2. Seven Databases in Seven Weeks, Second Edition
  3. Seven Databases in Seven Weeks, Second Edition
  4. Seven Databases in Seven Weeks, Second Edition
  5. Seven Databases in Seven Weeks, Second Edition
  6.  Acknowledgments
  7.  Preface
  8. Why a NoSQL Book
  9. Why Seven Databases
  10. What’s in This Book
  11. What This Book Is Not
  12. Code Examples and Conventions
  13. Credits
  14. Online Resources
  15. 1. Introduction
  16. It Starts with a Question
  17. The Genres
  18. Onward and Upward
  19. 2. PostgreSQL
  20. That’s Post-greS-Q-L
  21. Day 1: Relations, CRUD, and Joins
  22. Day 2: Advanced Queries, Code, and Rules
  23. Day 3: Full Text and Multidimensions
  24. Wrap-Up
  25. 3. HBase
  26. Introducing HBase
  27. Day 1: CRUD and Table Administration
  28. Day 2: Working with Big Data
  29. Day 3: Taking It to the Cloud
  30. Wrap-Up
  31. 4. MongoDB
  32. Hu(mongo)us
  33. Day 1: CRUD and Nesting
  34. Day 2: Indexing, Aggregating, Mapreduce
  35. Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS
  36. Wrap-Up
  37. 5. CouchDB
  38. Relaxing on the Couch
  39. Day 1: CRUD, Fauxton, and cURL Redux
  40. Day 2: Creating and Querying Views
  41. Day 3: Advanced Views, Changes API, and Replicating Data
  42. Wrap-Up
  43. 6. Neo4J
  44. Neo4j Is Whiteboard Friendly
  45. Day 1: Graphs, Cypher, and CRUD
  46. Day 2: REST, Indexes, and Algorithms
  47. Day 3: Distributed High Availability
  48. Wrap-Up
  49. 7. DynamoDB
  50. DynamoDB: The “Big Easy” of NoSQL
  51. Day 1: Let’s Go Shopping!
  52. Day 2: Building a Streaming Data Pipeline
  53. Day 3: Building an “Internet of Things” System Around DynamoDB
  54. Wrap-Up
  55. 8. Redis
  56. Data Structure Server Store
  57. Day 1: CRUD and Datatypes
  58. Day 2: Advanced Usage, Distribution
  59. Day 3: Playing with Other Databases
  60. Wrap-Up
  61. 9. Wrapping Up
  62. Genres Redux
  63. Making a Choice
  64. Where Do We Go from Here?
  65. A1. Database Overview Tables
  66. A2. The CAP Theorem
  67. Eventual Consistency
  68. CAP in the Wild
  69. The Latency Trade-Off
  70.  Bibliography
  71. Seven Databases in Seven Weeks, Second Edition

Wrap-Up

Neo4j is a top open source implementation of the (relatively rare) class of graph databases. Graph databases focus on the relationships between data, rather than the commonalities among values. Modeling graph data is simple. You just create nodes and relationships between them and optionally hang key-value pairs from them. Querying is as easy as declaring how to walk the graph from a starting node.

Neo4j’s Strengths

Neo4j is one of the finest examples of open source graph databases. Graph databases are perfect for unstructured data, in many ways even more so than document databases. Not only is Neo4j typeless and schemaless, but it puts no constraints on how data is related. It is, in the best sense, a free-for-all. Currently, Neo4j can support 34.4 billion nodes and 34.4 billion relationships, which is more than enough for most use cases. (Neo4j could hold more than 15 nodes for each of Facebook’s 2.2 billion users in a single graph.)

The Neo4j distributions provide several tools for fast lookups with Lucene, the Cypher querying language, and the REST interface. Beyond ease of use, Neo4j is fast. Unlike join operations in relational databases or map-reduce operations in other databases, graph traversals are constant time. Like data is only a node step away, rather than joining values in bulk and filtering the desired results, which is how most of the databases we’ve seen operate. It doesn’t matter how large the graph becomes; moving from node A to node B is always one step if they share a relationship. Finally, the Enterprise edition provides for highly available and high read-traffic sites by way of Neo4j HA.

Neo4j’s Weaknesses

Neo4j does have a few shortcomings. We found its choice of nomenclature (node rather than vertex and relationship rather than edge) to add complexity when communicating. Although HA is excellent at replication, it can only replicate a full graph to other servers. It cannot currently shard subgraphs, which still places a limit on graph size (though, to be fair, that limit measures in the tens of billions). Finally, if you are looking for a business-friendly open source license (like MIT), Neo4j may not be for you. Although the Community edition (everything we used in the first two days) is GPL, you’ll probably need to purchase a license if you want to run a production environment using the Enterprise tools (which includes HA and backups).

Neo4j on CAP

The term “high availability cluster" should be enough to give away Neo4j’s strategy. Neo4j HA is available and partition tolerant (AP). Each slave will return only what it currently has, which may be out of sync with the master node temporarily. Although you can reduce the update latency by increasing a slave’s pull interval, it’s still technically eventually consistent. This is why Neo4j HA is recommended for read-mostly requirements.

Parting Thoughts

Neo4j’s simplicity can be off-putting if you’re not used to modeling graph data. It provides a powerful open-source API with years of production use and yet it hasn’t gotten the same traction as other databases in this book. We chalk this up to lack of knowledge because graph databases mesh so naturally with how humans tend to conceptualize data. We imagine our families as trees, or our friends as graphs; most of us don’t imagine personal relationships as self-referential datatypes. For certain classes of problems, such as social networks, Neo4j is an obvious choice. But you should give it some serious consideration for non-obvious problems as well—it just may surprise you how powerful and easy it is.