Read operations hanged

2014-08-10 Thread Thomas Kwan
And I have a program that do some read operations and it hangs. And I am seeing 2014-08-10 12:22:05,359 DEBUG [main] client.HConnectionManager$HConnectionImplementation: Removed all cached region locations that map to dn29.manage.com,60020,1407600154728 2014-08-10 12:22:06,173 DEBUG [main]

Re: Read operations hanged

2014-08-10 Thread Ted Yu
bq. if I can just rmr stuff under /hbase-unsecure/splitWAL/... Please don't. Have you checked region server log on dn29.manage.com ? What hbase version are you using ? Cheers On Sun, Aug 10, 2014 at 10:27 AM, Thomas Kwan thomas.k...@manage.com wrote: And I have a program that do some read

Re: Read operations hanged

2014-08-10 Thread Thomas Kwan
Hi Ted, Hbase version is 0.96.0.2.0 Nothing interesting in the hbase log on dn29 and confirmed that region server is running on dn29 When I do 'get', i see hbase(main):001:0 get 'm_data','2fd811c2b1d7476efb16499ccb2b823d' COLUMN CELL ERROR:

Re: Read operations hanged

2014-08-10 Thread Ted Yu
Can you check master log to see why 'm_data,2fd811c2b1d7476efb16499ccb2b823d' went offline ? Thanks On Sun, Aug 10, 2014 at 12:13 PM, Thomas Kwan thomas.k...@manage.com wrote: Hi Ted, Hbase version is 0.96.0.2.0 Nothing interesting in the hbase log on dn29 and confirmed that region

Re: Read operations hanged

2014-08-10 Thread Thomas Kwan
Thanks for your help Ted. From the master's log, I see 2014-08-09 22:50:51,176 DEBUG [827019302@qtp-63557232-287] client.HBaseAdmin: Trying to compact {ENCODED = 12c9a609765ad0bbd6468d93368f860a, NAME = 'm_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.',

Re: Read operations hanged

2014-08-10 Thread Ted Yu
bq. it's host dn29.manage.com,60020,1407600154728 is dead but not processed yet Can you look back (from 22:50:51) in master log to see what happened to dn29 ? Thanks On Sun, Aug 10, 2014 at 2:51 PM, Thomas Kwan thomas.k...@manage.com wrote: Thanks for your help Ted. From the master's log,

Re: Read operations hanged

2014-08-10 Thread Qiang Tian
Did you set hbase.status.published to true? if you enable it, master publish dead server list to clients every 10s by default, then client removes the cached regions on this server. so there must be sth wrong on dn29, please find the related first failure occurrence. you could also pastebin the

Re: Read operations hanged

2014-08-10 Thread Ted Yu
bq. there was a compaction There was request for compaction. bq. if hbase hbck --repairHoles can fix this kind of thing? You can try the above command. As Qiang said, tracing back to the earlier failure would help determine root cause. Cheers On Sun, Aug 10, 2014 at 7:21 PM, Thomas Kwan