Sorry for not being clearer - the TTransportException was in the link.
For some reason, thrift seems to encounter a seemingly unexpected end
of stream so it seems at the transport level. I'll try to get a better
stack trace tomorrow.

On Mon, 15 Jul 2019 at 17:06, Josh Elser <[email protected]> wrote:
>
> You didn't mention anything about a TTransportException earlier.
>
> I don't remember the difference between Thrift exceptions (e.g.
> TTransportException, TApplicationException). I think one is supposed to
> be network-focused (e.g. socket timeout) and the other is
> application-focused (e.g. TabletServer got an error).
>
> If the Thrift exception is just wrapping an exception thrown by teh
> TabletServer, it's likely just necessary wrapping (which should be
> unwrapped by the client impl, fwiw) to support serialization over the wire.
>
> On 7/12/19 4:38 PM, James Srinivasan wrote:
> > Thanks, I'll check it out. There weren't any obviously errors around
> > hardware issues.
> >
> > Is it likely that the TTransportException and commits held are related?
> >
> >
> > On Fri, 12 Jul 2019, 18:56 Josh Elser, <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     "Commits are held" can be for a couple of different reasons, some from
> >     within Accumulo and some from outside.
> >
> >     In general, there is an expected ordering of mutations that a
> >     TabletServer has to apply. A "commit" here is the application of some
> >     mutations by a TabletServer to the memory map and the WAL.
> >
> >     This could be completely normal and you have some clients which are
> >     just
> >     writing "faster" than your TabletServers can keep up with. This
> >     could be
> >     indicative of slow flushes from memory maps to HDFS. This could be GC
> >     pressure causing slowness in the TServer.
> >
> >     I'd suggest to take a step back:
> >
> >     * Look at other messages in the DEBUG log for the tabletserver to
> >     see if
> >     you Accumulo is telling you what it's waiting on (before and after you
> >     see the message about commits being held)
> >     * Check that you're using the Accumulo native memory maps
> >     * Sanity-check performance of HDFS
> >     * Get a thread dump from a TabletServer in this state.
> >
> >     If the problem truly only happens on two servers, it might indicate
> >     some
> >     bad hardware on that device (memory with errors, a disk that flips
> >     to r/o).
> >
> >     - Josh
> >
> >     On 7/12/19 10:57 AM, James Srinivasan wrote:
> >      > Hi all,
> >      >
> >      > We have a Kerberized Accumulo 1.7.0 (HDP3) cluster with 25 tservers.
> >      > Recently, a couple of clients were reporting errrors writing data
> >     (fat
> >      > fingered from cluster, apologies for typos):
> >      >
> >      >
> >     
> > org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures
> >      > ...
> >      > Caused by:
> >     
> > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer
> >      >
> >      > Digging into the logs on the problematic tservers, I think the
> >      > following was firing, but don't know why:
> >      >
> >      >
> >     
> > https://github.com/apache/thrift/blob/0.9.1/lib/java/src/org/apache/thrift/transport/TIOStreamTransport.java#L132
> >      >
> >      > Also, the tserver logs report:
> >      >
> >      > Internal error processing closeUpdate....TException: Commits are held
> >      >
> >      > For now, I have stopped the two problematic tservers but any help
> >      > debugging would be much appreciated.
> >      >
> >      > James
> >      >
> >

Reply via email to