The server 10.2.130.1 has been rebooted. Yes, it is a production system with a lot of reads and writes.
On 10/22/15, dlmarion <dlmar...@comcast.net> wrote: > > > Are you trying to shut the whole system down, or just a couple of tablet > servers?Is your application reading and writing from/to Accumulo during this > time? > > > > > -------- Original message -------- > From: Denis <de...@camfex.cz> > Date: 10/22/2015 6:03 PM (GMT-05:00) > To: user@accumulo.apache.org > Subject: Re: Tserver's strange state. > > Both servers has the errors in the logs like these: > > ======== > 2015-10-22 03:28:00,599 ERROR > org.apache.accumulo.core.client.impl.Writer: error sending update to > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: 120000 millis timeout while waiting > for channel to be ready for re > ad. ch : java.nio.channels.SocketChannel[connected > local=/10.2.142.1:36148 remote=/10.2.130.1:9997] > 2015-10-22 03:28:04,283 ERROR > org.apache.accumulo.core.client.impl.Writer: error sending update to > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: 120000 millis timeout while waiting > for channel to be ready for re > ad. ch : java.nio.channels.SocketChannel[connected > local=/10.2.142.1:37047 remote=/10.2.130.1:9997] > 2015-10-22 03:28:06,116 ERROR > org.apache.accumulo.core.client.impl.Writer: error sending update to > 10.2.130.1:9997: org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: 120000 millis timeout while waiting > for channel to be ready for re > ad. ch : java.nio.channels.SocketChannel[connected > local=/10.2.142.1:37167 remote=/10.2.130.1:9997] > ======== > > On 10/22/15, Denis <de...@camfex.cz> wrote: >> Hi >> >> Sometimes my Tablet Servers go into a strange state: they have some >> very old scans (see picture: http://i.imgur.com/2sOUM99.png) and being >> in this state they cannot be decomissioned gracefully using "accumulo >> stop" - number of their tablets decreases down to some fixed number >> (say from 6K tablets to 2K), not to zero. >> It is diffucult to reproduce. >> Now I have a live system with 2 tabletservers in this state. >> Any suggestions how to catch the bug? >> >