A tablet server is given the rights to manage a tablet.

It is critical that no other server uses the tablet to maintain consistency.

To maintain the right to access a tablet, it must maintain a zookeeper
session. The zookeeper session periodically exchanges keep-alive messages.
If either party fails to get a keep-alive, zookeeper will close the
connection. The client can attempt to reconnect, but if it fails to do so,
the session will timeout.

If the tablet server loses its session with zookeeper, the rest of the
system can take over its tablets.

When a tablet detects that it lost its zookeeper session, it kills itself
to avoid doing anything with the tablets it no long has the right to host.

What you are seeing here is the first step in that process, and it is
probably due to the tablet server not sending a keep-alive message to
zookeeper in time.

There are many reasons for a tablet server to be delayed in sending a
keep-alive message. By far the most common is that your system is
over-subscribed for memory, and part of the tablet server's memory swapped
out. Once the java garbage collection cycle swapped it back in, there was a
considerable delay.

However, there can be other things going on.  This is just a best guess.
Monitor swap usage, as a first diagnostic step.

-Eric



On Tue, Dec 22, 2015 at 8:30 AM, mohit.kaushik <mohit.kaus...@orkash.com>
wrote:

> Dear All,
>
> The mutations rejected exception can be seen at client side with server
> error 1.
> *org.apache.accumulo.core.client.MutationsRejectedException: # constraint
> violations : 0  security codes: {}  # server errors 1 # exceptions 1\n\tat
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:537)\n\tat
> org.apache.accumulo.core.client.impl.TabletServerBatchWriter.addMutation(TabletServerBatchWriter.java:249)\n\tat
> org.apache.accumulo.core.client.impl.MultiTableBatchWriterImpl$TableBatchWriter.addMutation(MultiTableBatchWriterImpl.java:64)\n\tat
> com.orkash.accumulo.IngestionWithoutServiceOnCondition.main(IngestionWithoutServiceOnCondition.java:235)\n\tat
> com.orkash.db.DBQuery.insertLookUpDB(DBQuery.java:570)\n\tat
> com.orkash.Crawling.CrawlerThread.run(CrawlerThread.java:145)\n\tat
> java.lang.Thread.run(Thread.java:745)\nCaused by:
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on
> server orkash1:9997\n\tat *
>
> I also found exceptions in Monitor related to Tracing.
>
> *Tracing spans are being dropped because there are already 5000 spans queued 
> for delivery.
> This does not affect performance, security or data integrity, but distributed 
> tracing information is being lost.**and **6458 times**Got an IOException in 
> internalRead!
>       java.io.IOException: Connection reset by peer
>               at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>               at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>               at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>               at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>               at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>               at 
> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>               at 
> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
>               at 
> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
>               at 
> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
>               at 
> org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
>               at 
> org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run*
>
>
>
> I am facing the following exceptions in tserver logs and one tserver goes
> dead.
>
> *2015-12-22 09:37:27,173 [zookeeper.ZooCache] WARN : Saw (possibly)
> transient exception communicating with ZooKeeper, will retry*
> *org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for
> /accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/tables/16/conf/table.split.threshold*
> *        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)*
> *        at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)*
> *        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)*
> *        at
> org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:264)*
> *        at
> org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)*
> *        at
> org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)*
> *        at
> org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)*
> *        at
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:117)*
> *        at
> org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:103)*
> *        at
> org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:99)*
> *        at
> org.apache.accumulo.core.conf.AccumuloConfiguration.getMemoryInBytes(AccumuloConfiguration.java:197)*
> *        at
> org.apache.accumulo.tserver.tablet.Tablet.findSplitRow(Tablet.java:1604)*
> *        at
> org.apache.accumulo.tserver.tablet.Tablet.needsSplit(Tablet.java:1772)*
> *        at
> org.apache.accumulo.tserver.TabletServer$MajorCompactor.run(TabletServer.java:1853)*
> *        at
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)*
> *        at java.lang.Thread.run(Thread.java:745)*
>
> These are creating problems in continuously ingesting data and I also
> experienced some delay in queries and table create commands.
> Please comment what could be the cause of these exceptions?
>
> Thanks
> Mohit Kaushik
>
>

Reply via email to