And I have a program that do some read operations and it hangs. And I am seeing
2014-08-10 12:22:05,359 DEBUG [main]
client.HConnectionManager$HConnectionImplementation: Removed all
cached region locations that map to
dn29.manage.com,60020,1407600154728
2014-08-10 12:22:06,173 DEBUG [main]
bq. if I can just rmr stuff under /hbase-unsecure/splitWAL/...
Please don't.
Have you checked region server log on dn29.manage.com ?
What hbase version are you using ?
Cheers
On Sun, Aug 10, 2014 at 10:27 AM, Thomas Kwan thomas.k...@manage.com
wrote:
And I have a program that do some read
Hi Ted,
Hbase version is 0.96.0.2.0
Nothing interesting in the hbase log on dn29 and confirmed that region
server is running on dn29
When I do 'get', i see
hbase(main):001:0 get 'm_data','2fd811c2b1d7476efb16499ccb2b823d'
COLUMN CELL
ERROR:
Can you check master log to see why 'm_data,2fd811c2b1d7476efb16499ccb2b823d'
went offline ?
Thanks
On Sun, Aug 10, 2014 at 12:13 PM, Thomas Kwan thomas.k...@manage.com
wrote:
Hi Ted,
Hbase version is 0.96.0.2.0
Nothing interesting in the hbase log on dn29 and confirmed that region
Thanks for your help Ted.
From the master's log, I see
2014-08-09 22:50:51,176 DEBUG [827019302@qtp-63557232-287]
client.HBaseAdmin: Trying to compact {ENCODED =
12c9a609765ad0bbd6468d93368f860a, NAME =
'm_data,2fd811c2b1d7476efb16499ccb2b823d,1406512331699.12c9a609765ad0bbd6468d93368f860a.',
bq. it's host dn29.manage.com,60020,1407600154728 is dead but not processed
yet
Can you look back (from 22:50:51) in master log to see what happened to
dn29 ?
Thanks
On Sun, Aug 10, 2014 at 2:51 PM, Thomas Kwan thomas.k...@manage.com wrote:
Thanks for your help Ted.
From the master's log,
Did you set hbase.status.published to true? if you enable it, master
publish dead server list to clients every 10s by default, then client
removes the cached regions on this server. so there must be sth wrong on
dn29, please find the related first failure occurrence. you could also
pastebin the
bq. there was a compaction
There was request for compaction.
bq. if hbase hbck --repairHoles can fix this kind of thing?
You can try the above command.
As Qiang said, tracing back to the earlier failure would help determine
root cause.
Cheers
On Sun, Aug 10, 2014 at 7:21 PM, Thomas Kwan