-
HBASE DISTRIBUTED STORAGE ARCHITECTURE
A robust HBase architecture involves a few more parts than HBase alone. At the very least, an underlying distributed, centralized service for confi guration and synchronization is involved. HBase deployment adheres to a master-worker pattern. Therefore, there is usually a master and a set of workers, commonly known as range servers. When HBase starts, the master allocates a set of ranges to a range server. Each range stores an ordered set of rows, where each row is identifi ed by a unique row-key. As the number of rows stored in a range grows in size beyond a confi gured threshold, the range is split into two and rows are divided between the two new ranges.
Like most column-databases, HBase stores columns in a column-family together. Therefore, each region maintains a separate store for each column-family in every table. Each store in turn maps to a physical fi le that is stored in the underlying distributed fi lesystem. For each store, HBase abstracts access to the underlying fi lesystem with the help of a thin wrapper that acts as the intermediary between the store and the underlying physical fi le.
Each region has an in-memory store, or cache, and a write-ahead-log (WAL). To quote Wikipedia, http://en.wikipedia.org/wiki/Write-ahead_logging, “write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems.” WAL is a common technique used across a variety of database systems, including the popular relational database systems like PostgreSQL and MySQL. In HBase a client program could decide to turn WAL on or switch it off. Switching it off would boost performance but reduce reliability and recovery, in case of failure. When data is written to a region, it’s fi rst written to the write-ahead-log, if enabled. Soon afterwards, it’s written to the region’s in-memory store. If the in-memory store is full, data is fl ushed to disk and persisted in the underlying distributed storage.
If a distributed fi lesystem like the Hadoop distributed fi lesystem (HDFS) is used, then a masterworker pattern extends to the underlying storage scheme as well. In HDFS, a namenode and a set of datanodes form a structure analogous to the confi guration of master and range servers that column databases like HBase follow. Thus, in such a situation each physical storage fi le for an HBase column-family store ends up residing in an HDFS datanode. HBase leverages a fi lesystem API to avoid strong coupling with HDFS and so this API acts as the intermediary for conversations between an HBase store and a corresponding HDFS fi le. The API allows HBase to work seamlessly with other types of fi lesystems as well. For example, HBase could be used with CloudStore, formerly known as Kosmos FileSystem (KFS), instead of HDFS.
In addition to having the distributed fi lesystem for storage, an HBase cluster also leverages an external confi guration and coordination utility. In the seminal paper on Bigtable, Google named this confi guration program Chubby. Hadoop, being a Google infrastructure clone, created an exact counterpart and called it ZooKeeper. Hypertable calls the similar infrastructure piece Hyperspace. A ZooKeeper cluster typically front-ends an HBase cluster for new clients and manages confi guration.
To access HBase the fi rst time, a client accesses two catalogs via ZooKeeper. These catalogs are named -ROOT- and .META. The catalogs maintain state and location information for all the regions. -ROOT- keeps information of all .META. tables and a .META. fi le keeps records for a user-space table, that is, the table that holds the data. When a client wants to access a specifi c row it first asks ZooKeeper for the -ROOT- catalog. The -ROOT- catalog locates the .META. catalog relevant for the row, which in turn provides all the region details for accessing the specifi c row. Using this information the row is accessed. The three-step process of accessing a row is not repeated the next time the client asks for the row data. Column databases rely heavily on caching all relevant information, from this three-step lookup process. This means clients directly contact the region servers the next time they need the row data. The long loop of lookups is repeated only if the region information in the cache is stale or the region is disabled and inaccessible.
Each region is often identifi ed by the smallest row-key it stores, so looking up a row is usually as easy as verifying that the specifi c row-key is greater than or equal to the region identifi er.
So far, the essential conceptual and physical models of column database storage have been introduced. The behind-the-scenes mechanics of data write and read into these stores have also been exposed.
Source of Information : NoSQL
Subscribe to:
Post Comments (Atom)
0 comments: