(late answer, sorry) 60 seconds is a long time to be left hanging, my first guess would be to look for signs of GC in that region server and make sure you don't swap at all. You could go further and jstack the RS while it's hanging, maybe you'll see that all the handlers are busy, but even then if they take that long to process it might be due what I previously mentioned.
J-D On Tue, Sep 20, 2011 at 11:19 AM, Rohit Nigam <[email protected]> wrote: > Hi Guys > > I am getting this exception while running the job in the reducer phase. > The reducer retrieves a data structure for a key from hbase and then > populates it and put it back again. The exception says like this :-- > > > > java.net.SocketTimeoutException: Call to x server /xx.xx.xx.xxx:60020 > failed on socket timeout exception: java.net.SocketTimeoutException: > 60000 millis timeout while waiting for channel to be ready for read. ch > : java.nio.channels.SocketChannel[connected local=/yy.yy.yy.yyy:44751 > remote= x server /xx.xx.xx.xxx:60020] > > java.net.SocketTimeoutException: 20000 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=doop18.dt.sv4.decarta.com/10.241.8.238:60020] > > java.net.SocketTimeoutException: 20000 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=doop18.dt.sv4.decarta.com/10.241.8.238:60020] > > java.net.SocketTimeoutException: 20000 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=doop18.dt.sv4.decarta.com/10.241.8.238:60020] > > java.net.SocketTimeoutException: 20000 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=doop18.dt.sv4.decarta.com/10.241.8.238:60020] > > java.net.SocketTimeoutException: 20000 millis timeout while waiting for > channel to be ready for connect. ch : > java.nio.channels.SocketChannel[connection-pending > remote=doop18.dt.sv4.decarta.com/10.241.8.238:60020] > > java.net.SocketTimeoutException > > > > There is a lot of IO involved in the reducer phase as to talking with > HBASE. Would really appreciate if somebody can shed some light as to > when can this exception happen. Is it related to regionservers being too > busy to cater the request? > > > > Thanks > > Rohit > > > >
