how did you use scanner? paste some codes here.
On Mar 12, 2013 4:13 PM, "Lu, Wei" <w...@microstrategy.com> wrote:

>
> We turned the block cache to false and tried again, regionserver still
> crash one after another.
> There are a lot of scanner lease time out, and then master log info:
>         RegionServer ephemeral node deleted, processing expiration
> [rs21,60020,1363010589837]
> Seems the problem is not caused by block cache
>
>
> Thanks
>
> -----Original Message-----
> From: Azuryy Yu [mailto:azury...@gmail.com]
> Sent: Tuesday, March 12, 2013 1:41 PM
> To: user@hbase.apache.org
> Subject: Re: region server down when scanning using mapreduce
>
> please read here http://hbase.apache.org/book.html (11.8.5. Block Cache)
> to
> get some background of block cache.
>
>
> On Tue, Mar 12, 2013 at 1:31 PM, Lu, Wei <w...@microstrategy.com> wrote:
>
> > No, does block cache matter? Btw, the mr dump is a mr program we
> > implemented rather than the hbase tool.
> >
> > Thanks
> >
> > -----Original Message-----
> > From: Azuryy Yu [mailto:azury...@gmail.com]
> > Sent: Tuesday, March 12, 2013 1:18 PM
> > To: user@hbase.apache.org
> > Subject: Re: region server down when scanning using mapreduce
> >
> > did you closed block cache when you used mr dump?
> > On Mar 12, 2013 1:06 PM, "Lu, Wei" <w...@microstrategy.com> wrote:
> >
> > > Hi,
> > >
> > > When we use mapreduce to dump data from a pretty large table on hbase.
> > One
> > > region server crash and then another. Mapreduce is deployed together
> with
> > > hbase.
> > >
> > > 1) From log of the region server, there are both "next" and "multi"
> > > operations on going. Is it because there is write/read conflict that
> > cause
> > > scanner timeout?
> > > 2) Region server has 24 cores, and # max map tasks is 24 too; the table
> > > has about 30 regions (each of size 0.5G) on the region server, is it
> > > because cpu is all used by mapreduce and that case region server slow
> and
> > > then timeout?
> > > 2) current hbase.regionserver.handler.count is 10 by default, should it
> > be
> > > enlarged?
> > >
> > > Please give us some advices.
> > >
> > > Thanks,
> > > Wei
> > >
> > >
> > > Log information:
> > >
> > >
> > > [Regionserver rs21:]
> > >
> > > 2013-03-11 18:36:28,148 INFO
> > > org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/
> > > adcbg21.machine.wisdom.com
> > ,60020,1363010589837/rs21%2C60020%2C1363010589837.1363025554488,
> > > entries=22417, filesize=127539793.  for
> > >
> >
> /hbase/.logs/rs21,60020,1363010589837/rs21%2C60020%2C1363010589837.1363026988052
> > > 2013-03-11 18:37:39,481 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept 28183ms instead of 3000ms, this is likely due to a long garbage
> > > collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":29830,"call":"next(1656517918313948447, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56058
> > >
> >
> ","starttimems":1363027030280,"queuetimems":4602,"class":"HRegionServer","responsesize":2774484,"method":"next"}
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31195,"call":"next(-8353194140406556404, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56529
> > >
> >
> ","starttimems":1363027028804,"queuetimems":3634,"class":"HRegionServer","responsesize":2270919,"method":"next"}
> > > 2013-03-11 18:37:40,163 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":30965,"call":"next(2623756537510669130, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56146
> > >
> >
> ","starttimems":1363027028807,"queuetimems":3484,"class":"HRegionServer","responsesize":2753299,"method":"next"}
> > > 2013-03-11 18:37:40,236 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31023,"call":"next(5293572780165196795, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56069
> > >
> >
> ","starttimems":1363027029086,"queuetimems":3589,"class":"HRegionServer","responsesize":2722543,"method":"next"}
> > > 2013-03-11 18:37:40,368 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > > {"processingtimems":31160,"call":"next(-4285417329791344278, 1000), rpc
> > > version=1, client version=29, methodsFingerPrint=54742778","client":"
> > > 10.20.127.21:56586
> > >
> >
> ","starttimems":1363027029204,"queuetimems":3707,"class":"HRegionServer","responsesize":2938870,"method":"next"}
> > > 2013-03-11 18:37:43,652 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":31249,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@2d19985a
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.109.21:35342
> > >
> >
> ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > 2013-03-11 18:37:49,108 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":38813,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@19c59a2e
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.125.11:57078
> > >
> >
> ","starttimems":1363027030273,"queuetimems":4663,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > 2013-03-11 18:37:50,410 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":38893,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@40022ddb
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.109.20:51698
> > >
> >
> ","starttimems":1363027031505,"queuetimems":5720,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > 2013-03-11 18:37:50,642 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":40037,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6b8bc8cf
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.125.11:57078
> > >
> >
> ","starttimems":1363027030601,"queuetimems":4818,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > 2013-03-11 18:37:51,529 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":10880,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6928d7b
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.125.11:57076
> > >
> >
> ","starttimems":1363027060645,"queuetimems":34763,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > 2013-03-11 18:37:51,776 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":41327,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@354baf25
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.125.11:57076
> > >
> >
> ","starttimems":1363027030411,"queuetimems":4680,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > > 2013-03-11 18:38:32,361 WARN org.apache.hadoop.ipc.HBaseServer:
> > > (responseTooSlow):
> > >
> >
> {"processingtimems":10204,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6d86b477
> > ),
> > > rpc version=1, client version=29,
> methodsFingerPrint=54742778","client":"
> > > 10.20.125.10:36950
> > >
> >
> ","starttimems":1363027102044,"queuetimems":11027,"class":"HRegionServer","responsesize":0,"method":"multi"}
> > >
> > > ----------------------------------------------------------
> > > [master:]
> > > 2013-03-11 18:35:39,386 WARN org.apache.hadoop.conf.Configuration:
> > > fs.default.name is deprecated. Instead, use fs.defaultFS
> > > 2013-03-11 18:38:25,892 INFO
> org.apache.hadoop.hbase.master.LoadBalancer:
> > > Skipping load balancing because balanced cluster; servers=10
> regions=477
> > > average=47.7 mostloaded=52 leastloaded=45
> > > 2013-03-11 18:39:42,002 INFO
> > > org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer
> > > ephemeral node deleted, processing expiration
> [rs21,60020,1363010589837]
> > > 2013-03-11 18:39:42,007 INFO
> > > org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting
> > > logs for rs21,60020,1363010589837
> > > 2013-03-11 18:39:42,024 INFO
> > > org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers
> > > [rs21,60020,1363010589837]
> > > 2013-03-11 18:39:42,033 INFO
> > > org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs
> in
> > > [hdfs://rs26/hbase/.logs/rs21,60020,1363010589837-splitting]
> > > 2013-03-11 18:39:42,179 INFO
> > > org.apache.hadoop.hbase.master.SplitLogManager: task
> > >
> >
> /hbase/splitlog/hdfs%3A%2F%2Frs26%3A8020%2Fhbase%2F.logs%2Frs21%2C60020%2C1363010589837-splitting%2Frs21%252C60020%252C1363010589837.1363010594599
> > > acquired by rs19,1363010590987
> > >
> > >
> > > ----------------------------------------------------------
> > > [Regionserver rs21:]
> > > 2013-03-11 18:40:06,326 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > Responder, call next(8419833035992682478, 1000), rpc version=1, client
> > > version=29, methodsFingerPrint=54742778 from 10.20.127.21:33592:
> output
> > > error
> > > 2013-03-11 18:40:06,326 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -1069130278416755239 lease expired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -7554305902624086957 lease expired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -1817452922125791171 lease expired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -7601125239682768076 lease expired
> > > 2013-03-11 18:40:06,327 WARN org.apache.hadoop.hbase.util.Sleeper: We
> > > slept 82186ms instead of 3000ms, this is likely due to a long garbage
> > > collecting pause and it's usually bad, see
> > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 5506385887933665130 lease expired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 8405483593065293761 lease expired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 8270919548717867130 lease expired
> > > 2013-03-11 18:40:06,327 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -5350053253744349360 lease expired
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -2774223416111392810 lease expired
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 5293572780165196795 lease expired
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 2904518513855545553 lease expired
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 121972859714825295 lease expired
> > > 2013-03-11 18:40:06,328 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > Responder, call next(1316499555392112856, 1000), rpc version=1, client
> > > version=29, methodsFingerPrint=54742778 from 10.20.127.21:33552:
> output
> > > error
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 751003440851341708 lease expired
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 3456313588148401866 lease expired
> > > 2013-03-11 18:40:06,328 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -1893617732870830965 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -677968643251998870 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 2623756537510669130 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > -4453586756904422814 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 4513044921501336208 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 8419833035992682478 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 6790476379016482048 lease expired
> > > 2013-03-11 18:40:06,329 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> > > 1316499555392112856 lease expired
> > > 2013-03-11 18:40:06,337 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > Responder, call next(6790476379016482048, 1000), rpc version=1, client
> > > version=29, methodsFingerPrint=54742778 from 10.20.127.21:33603:
> output
> > > error
> > > 2013-03-11 18:40:06,485 INFO org.apache.zookeeper.ClientCnxn: Client
> > > session timed out, have not heard from server in 86375ms for sessionid
> > > 0x13c9789d2289cd1, closing socket connection and attempting reconnect
> > > 2013-03-11 18:40:06,493 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > handler 5 on 60020 caught: java.nio.channels.ClosedChannelException
> > >         at
> > >
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
> > >         at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1732)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1675)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:940)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1019)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:425)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1365)
> > >
> > > 2013-03-11 18:40:06,489 INFO org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > Responder: doAsyncWrite threw exception java.io.IOException: Broken
> pipe
> > > 2013-03-11 18:40:06,517 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> > Server
> > > handler 8 on 60020 caught: java.io.IOException: Broken pipe
> > >         at sun.nio.ch.FileDispatcher.write0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
> > >         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
> > >         at sun.nio.ch.IOUtil.write(IOUtil.java:40)
> > >         at
> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
> > >         at
> > >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelIO(HBaseServer.java:1732)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1675)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:940)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:1019)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:425)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1365)
> > >
> > >
> > >
> > >
> >
>

Reply via email to