How long Hbase need to take for recovering one crashed RegionServer

What you are looking for is HBase Mean Time To Recovery.
There are some articles talking about it. To answer your question based on this article:

How long does Hbase take to recover from a failure

it depends on your settings, your hbase version, your hardware...
There are 3 steps for this process:

  1. Identify that a region server is down. This is done by a process called heartbeat executed by Zookeeper. If the region server does not answer to the heartbeat before the Timeout, the master will consider the regionServer dead.
  2. Recovering the writes in progress: Before writing to a region server, the writes are preserved in a log. Because the data is replicated, let's say three times, if a node crash, you still have two logs with the right values. so when the master knows a region server is dead, it will try to recover his last state reading the log.
  3. Reassigning the region: it depends on your HBase version

Is data lost in the meantime?

Yes, the client is blocked until the recovery is done. That's why there are ways to minimize this down time by playing with the settings of hbase and zookeeper. see this blog post for the manipulation.

EDIT

As said by FengWang, I might imply that HBase takes a long time to recover from a failure. Compare to Cassandra, it does take more resources to recover a node. This can be explained by the CAP theorem: Hbase with its master/regionServer architecture is consistent and partition tolerant but not available. However, Cassandra with its peer to peer like architecture is available and partition tolerant but not consistent.

This is only generality, because in fact, you can tune HBase to be available with the right configuration and scheme (like FengWang have), but you will loose other things. Having 100 nodes where you could have 10 nodes with bigger storing capacities is a big price difference. Also, having to query more nodes for a scan is not cost efficient but with fine tuning you can overcome this problem (with a good data scheme you can avoid to scan across too many nodes). In Cassandra case, you can set a consistency level on queries. The higher the level, the slower the query.

In distributed system, you can only trade one thing for another. There is no generic solution for a problem.


I did some tests on 100 nodes Hbase cluster. When one RegionServer down Hbase usually take 3-5 seconds to reload the missed regions and Hlog from HDFS. i.e The client only was blocked less than 5 seconds. Not like the above post said that it will take 1 minute. If it really need to take 1 minute I bet no one wants to use Hbase.

While for Cassandra, if one node is down, it usually take less than 1 second to reload the missed data.

Tags:

Hbase