Re: Slow recovery on lost data node?

2010-12-09 Thread Ted Yu
Juhani: You can also consider https://issues.apache.org/jira/browse/HBASE-1537 which is not in hbase-0.89.20100924+28 You can apply Andrew's patch yourself. On Thu, Dec 9, 2010 at 10:29 AM, Jean-Daniel Cryans wrote: > > Regarding using a

Re: Slow recovery on lost data node?

2010-12-09 Thread Jean-Daniel Cryans
> Regarding using a lot of families... They are currently partitioned in a > manner that reflects the various data groups that are likely to be read > together... We're doing a lot of big scans on the regions of only one of > those families, with scans of the full table being much shorter/rarer. By

Re: Slow recovery on lost data node?

2010-12-08 Thread Juhani Connolly
Thanks for the information, it should help. Regarding using a lot of families... They are currently partitioned in a manner that reflects the various data groups that are likely to be read together... We're doing a lot of big scans on the regions of only one of those families, with scans of th

Re: Slow recovery on lost data node?

2010-12-08 Thread Jean-Daniel Cryans
Hey Juhani, The current state of client retries/sleep is something that needs to be reviewed/redone. It's currently on the roadmap for 0.92, see https://issues.apache.org/jira/browse/HBASE-2445 Regarding what you can do right now, the sleeps are done using an exponential backoff meaning that the

Slow recovery on lost data node?

2010-12-08 Thread Juhani Connolly
Hi there, We're currently running a cluster under expected load, and testing various hardware failure cases. Among them is a lost regionServer/dataNode, which results in our writer process(in our case a servlet under tomcat) just waiting indefinitely on put flushes until the region becomes av