Table of Contents for
Seven Databases in Seven Weeks, 2nd Edition

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Seven Databases in Seven Weeks, 2nd Edition by Jim Wilson Published by Pragmatic Bookshelf, 2018
  1. Title Page
  2. Seven Databases in Seven Weeks, Second Edition
  3. Seven Databases in Seven Weeks, Second Edition
  4. Seven Databases in Seven Weeks, Second Edition
  5. Seven Databases in Seven Weeks, Second Edition
  6.  Acknowledgments
  7.  Preface
  8. Why a NoSQL Book
  9. Why Seven Databases
  10. What’s in This Book
  11. What This Book Is Not
  12. Code Examples and Conventions
  13. Credits
  14. Online Resources
  15. 1. Introduction
  16. It Starts with a Question
  17. The Genres
  18. Onward and Upward
  19. 2. PostgreSQL
  20. That’s Post-greS-Q-L
  21. Day 1: Relations, CRUD, and Joins
  22. Day 2: Advanced Queries, Code, and Rules
  23. Day 3: Full Text and Multidimensions
  24. Wrap-Up
  25. 3. HBase
  26. Introducing HBase
  27. Day 1: CRUD and Table Administration
  28. Day 2: Working with Big Data
  29. Day 3: Taking It to the Cloud
  30. Wrap-Up
  31. 4. MongoDB
  32. Hu(mongo)us
  33. Day 1: CRUD and Nesting
  34. Day 2: Indexing, Aggregating, Mapreduce
  35. Day 3: Replica Sets, Sharding, GeoSpatial, and GridFS
  36. Wrap-Up
  37. 5. CouchDB
  38. Relaxing on the Couch
  39. Day 1: CRUD, Fauxton, and cURL Redux
  40. Day 2: Creating and Querying Views
  41. Day 3: Advanced Views, Changes API, and Replicating Data
  42. Wrap-Up
  43. 6. Neo4J
  44. Neo4j Is Whiteboard Friendly
  45. Day 1: Graphs, Cypher, and CRUD
  46. Day 2: REST, Indexes, and Algorithms
  47. Day 3: Distributed High Availability
  48. Wrap-Up
  49. 7. DynamoDB
  50. DynamoDB: The “Big Easy” of NoSQL
  51. Day 1: Let’s Go Shopping!
  52. Day 2: Building a Streaming Data Pipeline
  53. Day 3: Building an “Internet of Things” System Around DynamoDB
  54. Wrap-Up
  55. 8. Redis
  56. Data Structure Server Store
  57. Day 1: CRUD and Datatypes
  58. Day 2: Advanced Usage, Distribution
  59. Day 3: Playing with Other Databases
  60. Wrap-Up
  61. 9. Wrapping Up
  62. Genres Redux
  63. Making a Choice
  64. Where Do We Go from Here?
  65. A1. Database Overview Tables
  66. A2. The CAP Theorem
  67. Eventual Consistency
  68. CAP in the Wild
  69. The Latency Trade-Off
  70.  Bibliography
  71. Seven Databases in Seven Weeks, Second Edition

Introducing HBase

HBase is a column-oriented database that prides itself on its ability to provide both consistency and scalability. It is based on Bigtable, a high-performance, proprietary database developed by Google and described in the 2006 white paper “Bigtable: A Distributed Storage System for Structured Data.”[12] Initially created for natural language processing, HBase started life as a contrib package for Apache Hadoop. Since then, it has become a top-level Apache project.

Luc says:
Luc says:
Hosted HBase with Google Cloud Bigtable

As you’ll see later in this chapter, HBase can be tricky to administer. Fortunately, there’s now a compelling option for those who want to utilize the power of HBase with very little operational burden: Google’s Cloud Bigtable, which is part of its Cloud Platform suite of products. Cloud Bigtable isn’t 100% compatible with HBase but as of early 2018 it’s very close—close enough that you may be able to migrate many existing HBase applications over.

If you find the basic value proposition of, for example, Amazon’s cloud-based DynamoDB compelling and you think HBase is a good fit for a project, then Cloud Bigtable might be worth checking out. You can at least be assured that it’s run by the same company that crafted the concepts behind HBase (and the folks at Google do seem to know a thing or two about scale).

On the architecture front, HBase is designed to be fault tolerant. Hardware failures may be uncommon in individual machines but, in large clusters, node failure is the norm (as are network issues). HBase can gracefully recover from individual server failures because it uses both write-ahead logging, which writes data to an in-memory log before it’s written (so that nodes can use the log for recovery rather than disk), and distributed configuration, which means that nodes can rely on each other for configuration rather than on a centralized source.

Additionally, HBase lives in the Hadoop ecosystem, where it benefits from its proximity to other related tools. Hadoop is a sturdy, scalable computing platform that provides a distributed file system and MapReduce capabilities. Wherever you find HBase, you’ll find Hadoop and other infrastructural components that you can use in your own applications, such as Apache Hive, a data warehousing tool, and Apache Pig, a parallel processing tool (and many others).

Finally, HBase is actively used and developed by a number of high-profile companies for their “Big Data” problems. Notably, Facebook uses HBase for a variety of purposes, including for Messages, search indexing, and stream analysis. Twitter uses it to power its people search capability, for monitoring and performance data, and more. Airbnb uses it as part of their realtime stream processing stack. Apple uses it for...something, though they won‘t publicly say what. The parade of companies using HBase also includes the likes of eBay, Meetup, Ning, Yahoo!, and many others.

With all of this activity, new versions of HBase are coming out at a fairly rapid clip. At the time of this writing, the current stable version is 1.2.1, so that’s what you’ll be using. Go ahead and download HBase, and we’ll get started.