Hi, 

When we use mapreduce to dump data from a pretty large table on hbase. One 
region server crash and then another. Mapreduce is deployed together with hbase.

1) From log of the region server, there are both "next" and "multi" operations 
on going. Is it because there is write/read conflict that cause scanner timeout?
2) Region server has 24 cores, and # max map tasks is 24 too; the table has 
about 30 regions (each of size 0.5G) on the region server, is it because cpu is 
all used by mapreduce and that case region server slow and then timeout?
2) current hbase.regionserver.handler.count is 10 by default, should it be 
enlarged?

Please give us some advices.

Thanks,
Wei


Log information: 


[Regionserver rs21:]

2013-03-11 18:36:28,148 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
Roll 
/hbase/.logs/adcbg21.machine.wisdom.com,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
 entries=22417, filesize=127539793.  for 
/hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
28183ms instead of 3000ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":29830,"call":"next(1656517918313948447, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56058","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":31195,"call":"next(-8353194140406556404, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56529","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":30965,"call":"next(2623756537510669130, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56146","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":31023,"call":"next(5293572780165196795, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56069","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): {"processingtimems":31160,"call":"next(-4285417329791344278, 
1000), rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.127.21:56586","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.109.21:35342","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.109.20:51698","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57078","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6928d7b),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57076","starttimems":1363027060645,"queuetimems":34763,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:37:51,776 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":41327,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@354baf25),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.11:57076","starttimems":1363027030411,"queuetimems":4680,"class":"HRegionServer","responsesize":0,"method":"multi"}
2013-03-11 18:38:32,361 WARN org.apache.hadoop.ipc.HBaseServer: 
(responseTooSlow): 
{"processingtimems":10204,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6d86b477),
 rpc version=1, client version=29, 
methodsFingerPrint=54742778","client":"10.20.125.10:36950","starttimems":1363027102044,"queuetimems":11027,"class":"HRegionServer","responsesize":0,"method":"multi"}

----------------------------------------------------------
[master:]
2013-03-11 18:35:39,386 WARN org.apache.hadoop.conf.Configuration: 
fs.default.name is deprecated. Instead, use fs.defaultFS
2013-03-11 18:38:25,892 INFO org.apache.hadoop.hbase.master.LoadBalancer: 
Skipping load balancing because balanced cluster; servers=10 regions=477 
average=47.7 mostloaded=52 leastloaded=45
2013-03-11 18:39:42,002 INFO 
org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral 
node deleted, processing expiration [rs21,60020,1363010589837]
2013-03-11 18:39:42,007 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for rs21,60020,1363010589837
2013-03-11 18:39:42,024 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
dead splitlog workers [rs21,60020,1363010589837]
2013-03-11 18:39:42,033 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
started splitting logs in 
[hdfs://rs26/hbase/.logs/rs21,60020,1363010589837-splitting]
2013-03-11 18:39:42,179 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Frs26%3A8020%2Fhbase%2F.logs%2Frs21%2C60020%2C1363010589837-splitting%2Frs21%252C60020%252C1363010589837.1363010594599
 acquired by rs19,1363010590987


----------------------------------------------------------
[Regionserver rs21:]
2013-03-11 18:40:06,326 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
Responder, call next(8419833035992682478, 1000), rpc version=1, client 
version=29, methodsFingerPrint=54742778 from 10.20.127.21:33592: output error
2013-03-11 18:40:06,326 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-1069130278416755239 lease expired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-7554305902624086957 lease expired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-1817452922125791171 lease expired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-7601125239682768076 lease expired
2013-03-11 18:40:06,327 WARN org.apache.hadoop.hbase.util.Sleeper: We slept 
82186ms instead of 3000ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5506385887933665130 
lease expired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 8405483593065293761 
lease expired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 8270919548717867130 
lease expired
2013-03-11 18:40:06,327 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-5350053253744349360 lease expired
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-2774223416111392810 lease expired
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 5293572780165196795 
lease expired
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2904518513855545553 
lease expired
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 121972859714825295 
lease expired
2013-03-11 18:40:06,328 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
Responder, call next(1316499555392112856, 1000), rpc version=1, client 
version=29, methodsFingerPrint=54742778 from 10.20.127.21:33552: output error
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 751003440851341708 
lease expired
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 3456313588148401866 
lease expired
2013-03-11 18:40:06,328 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-1893617732870830965 lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner -677968643251998870 
lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2623756537510669130 
lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-4453586756904422814 lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 4513044921501336208 
lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 8419833035992682478 
lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 6790476379016482048 
lease expired
2013-03-11 18:40:06,329 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 1316499555392112856 
lease expired
2013-03-11 18:40:06,337 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
Responder, call next(6790476379016482048, 1000), rpc version=1, client 
version=29, methodsFingerPrint=54742778 from 10.20.127.21:33603: output error
2013-03-11 18:40:06,485 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 86375ms for sessionid 
0x13c9789d2289cd1, closing socket connection and attempting reconnect
2013-03-11 18:40:06,493 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 5 on 60020 caught: java.nio.channels.ClosedChannelException
        at 
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1732)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1675)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:940)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1019)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:425)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1365)

2013-03-11 18:40:06,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
Responder: doAsyncWrite threw exception java.io.IOException: Broken pipe
2013-03-11 18:40:06,517 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 8 on 60020 caught: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
        at sun.nio.ch.IOUtil.write(IOUtil.java:40)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1732)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1675)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:940)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1019)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:425)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1365)



Reply via email to