Re: Mutation Rejected exception with server Error 1

mohit.kaushik Tue, 22 Dec 2015 20:32:05 -0800

I have 3 tablet servers having around 1.4K tablets. If a tablet serverloses its session with zookeeper and killed itself. The system takessome time to move all hosted tablets to other servers.

In this case if a ingest in process then what should happen with themutations going to tablets hosted by that tablet server?Is it the reason for the first exception?Should they not be redirectedto other servers?nd I had set the system swappiness to 1. Should I keep it 0 in thiscase? I will check further.


Thanks for the reply

-Mohit Kaushik

On 12/22/2015 08:17 PM, Eric Newton wrote:

A tablet server is given the rights to manage a tablet.

It is critical that no other server uses the tablet to maintainconsistency.

To maintain the right to access a tablet, it must maintain a zookeepersession. The zookeeper session periodically exchanges keep-alivemessages. If either party fails to get a keep-alive, zookeeper willclose the connection. The client can attempt to reconnect, but if itfails to do so, the session will timeout.

If the tablet server loses its session with zookeeper, the rest of thesystem can take over its tablets.

When a tablet detects that it lost its zookeeper session, it killsitself to avoid doing anything with the tablets it no long has theright to host.

What you are seeing here is the first step in that process, and it isprobably due to the tablet server not sending a keep-alive message tozookeeper in time.

There are many reasons for a tablet server to be delayed in sending akeep-alive message. By far the most common is that your system isover-subscribed for memory, and part of the tablet server's memoryswapped out. Once the java garbage collection cycle swapped it backin, there was a considerable delay.

However, there can be other things going on. This is just a bestguess. Monitor swap usage, as a first diagnostic step.


-Eric

On Tue, Dec 22, 2015 at 8:30 AM, mohit.kaushik<mohit.kaus...@orkash.com <mailto:mohit.kaus...@orkash.com>> wrote:


    Dear All,

    The mutations rejected exception can be seen at client side with
    server error 1.
    /*org.apache.accumulo.core.client.MutationsRejectedException: #
    constraint violations : 0  security codes: {}  # server errors 1 #
    exceptions 1\n\tat
    
org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:537)\n\tat
    
org.apache.accumulo.core.client.impl.TabletServerBatchWriter.addMutation(TabletServerBatchWriter.java:249)\n\tat
    
org.apache.accumulo.core.client.impl.MultiTableBatchWriterImpl$TableBatchWriter.addMutation(MultiTableBatchWriterImpl.java:64)\n\tat
    
com.orkash.accumulo.IngestionWithoutServiceOnCondition.main(IngestionWithoutServiceOnCondition.java:235)\n\tat
    com.orkash.db.DBQuery.insertLookUpDB(DBQuery.java:570)\n\tat
    com.orkash.Crawling.CrawlerThread.run(CrawlerThread.java:145)\n\tat 
java.lang.Thread.run(Thread.java:745)\nCaused
    by: org.apache.accumulo.core.client.impl.AccumuloServerException:
    Error on server orkash1:9997\n\tat */

    I also found exceptions in Monitor related to Tracing.

    *Tracing spans are being dropped because there are already 5000 spans 
queued for delivery.
    This does not affect performance, security or data integrity, but 
distributed tracing information is being lost.**
    **
    **and**6458 times**
    **Got an IOException in internalRead!
        java.io.IOException: Connection reset by peer
                at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
                at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
                at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
                at sun.nio.ch.IOUtil.read(IOUtil.java:197)
                at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
                at 
org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
                at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:537)
                at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
                at 
org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
                at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.select(CustomNonBlockingServer.java:228)
                at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$SelectAcceptThread.run*



    I am facing the following exceptions in tserver logs and one
    tserver goes dead.

    *2015-12-22 09:37:27,173 [zookeeper.ZooCache] WARN : Saw
    (possibly) transient exception communicating with ZooKeeper, will
    retry**
    **org.apache.zookeeper.KeeperException$ConnectionLossException:
    KeeperErrorCode = ConnectionLoss for
    
/accumulo/f8708e0d-9238-41f5-b948-8f435fd01207/tables/16/conf/table.split.threshold**
    **        at
    org.apache.zookeeper.KeeperException.create(KeeperException.java:99)**
    **        at
    org.apache.zookeeper.KeeperException.create(KeeperException.java:51)**
    **        at
    org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)**
    **        at
    org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:264)**
    **        at
    org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)**
    **        at
    org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)**
    **        at
    org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)**
    **        at
    
org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:117)**
    **        at
    
org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCachePropertyAccessor.java:103)**
    **        at
    
org.apache.accumulo.server.conf.TableConfiguration.get(TableConfiguration.java:99)**
    **        at
    
org.apache.accumulo.core.conf.AccumuloConfiguration.getMemoryInBytes(AccumuloConfiguration.java:197)**
    **        at
    org.apache.accumulo.tserver.tablet.Tablet.findSplitRow(Tablet.java:1604)**
    **        at
    org.apache.accumulo.tserver.tablet.Tablet.needsSplit(Tablet.java:1772)**
    **        at
    
org.apache.accumulo.tserver.TabletServer$MajorCompactor.run(TabletServer.java:1853)**
    **        at
    org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)**
    **        at java.lang.Thread.run(Thread.java:745)**
    *
    These are creating problems in continuously ingesting data and I
    also experienced some delay in queries and table create commands.
    Please comment what could be the cause of these exceptions?

    Thanks
    Mohit Kaushik

    **



--
Signature

*Mohit Kaushik*
Software Engineer
A Square,Plot No. 278, Udyog Vihar, Phase 2, Gurgaon 122016, India
*Tel:*+91 (124) 4969352 | *Fax:*+91 (124) 4033553

<http://politicomapper.orkash.com>interactive social intelligence at work...

<https://www.facebook.com/Orkash2012><http://www.linkedin.com/company/orkash-services-private-limited><https://twitter.com/Orkash> <http://www.orkash.com/blog/><http://www.orkash.com>

<http://www.orkash.com> ... ensuring Assurance in complexity and uncertainty

/This message including the attachments, if any, is a confidentialbusiness communication. If you are not the intended recipient it may beunlawful for you to read, copy, distribute, disclose or otherwise usethe information in this e-mail. If you have received it in error or arenot the intended recipient, please destroy it and notify the senderimmediately. Thank you /

Re: Mutation Rejected exception with server Error 1

Reply via email to