Thanks, I'll check it out. There weren't any obviously errors around
hardware issues.

Is it likely that the TTransportException and commits held are related?


On Fri, 12 Jul 2019, 18:56 Josh Elser, <[email protected]> wrote:

> "Commits are held" can be for a couple of different reasons, some from
> within Accumulo and some from outside.
>
> In general, there is an expected ordering of mutations that a
> TabletServer has to apply. A "commit" here is the application of some
> mutations by a TabletServer to the memory map and the WAL.
>
> This could be completely normal and you have some clients which are just
> writing "faster" than your TabletServers can keep up with. This could be
> indicative of slow flushes from memory maps to HDFS. This could be GC
> pressure causing slowness in the TServer.
>
> I'd suggest to take a step back:
>
> * Look at other messages in the DEBUG log for the tabletserver to see if
> you Accumulo is telling you what it's waiting on (before and after you
> see the message about commits being held)
> * Check that you're using the Accumulo native memory maps
> * Sanity-check performance of HDFS
> * Get a thread dump from a TabletServer in this state.
>
> If the problem truly only happens on two servers, it might indicate some
> bad hardware on that device (memory with errors, a disk that flips to r/o).
>
> - Josh
>
> On 7/12/19 10:57 AM, James Srinivasan wrote:
> > Hi all,
> >
> > We have a Kerberized Accumulo 1.7.0 (HDP3) cluster with 25 tservers.
> > Recently, a couple of clients were reporting errrors writing data (fat
> > fingered from cluster, apologies for typos):
> >
> >
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures
> > ...
> > Caused by:
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer
> >
> > Digging into the logs on the problematic tservers, I think the
> > following was firing, but don't know why:
> >
> >
> https://github.com/apache/thrift/blob/0.9.1/lib/java/src/org/apache/thrift/transport/TIOStreamTransport.java#L132
> >
> > Also, the tserver logs report:
> >
> > Internal error processing closeUpdate....TException: Commits are held
> >
> > For now, I have stopped the two problematic tservers but any help
> > debugging would be much appreciated.
> >
> > James
> >
>

Reply via email to