Given that SQL is the lingua franca in the database world, a low-level get/put/scan API puts a bit of adoption friction on HBase since developers now need to learn a new API to interact with the database. Similarly, the lack of a type system puts the burden on the application developer to ensure that the encoding and decoding of the bytes stored in HBase remains in-sync. To mitigate this developer friction, there are now various projects that provide an SQL interface over HBase. Of these projects, the most popular and, arguably, most well developed is Apache Phoenix.
Apache Phoenix allows developers to use a subset of standard ANSI SQL to interact with their HBase tables.
The features supported include the following:
- Creating a table with well-defined types
- Performing standard Insert/Update/Delete/Select operations on the table
- Building and maintaining secondary indices
- Building and maintaining materialized views
- Performing inner/outer joins between tables
- Invoking grouping, ordering, and standard SQL aggregate functions
Phoenix works by efficiently translating standard SQL queries into HBase API calls and executing them against the cluster. Phoenix heavily uses HBase coprocessors to push down as much of an operation into the RegionServers as possible, instead of executing them at the client side.
We recommend that a user new to HBase looks into Phoenix since it represents standard recipes and best practices on how to efficiently store and query data from HBase.