Re: Dead Tablet Server

Keith Turner Tue, 17 Sep 2013 13:26:01 -0700

On Tue, Sep 17, 2013 at 3:32 PM, Ott, Charles H. <[email protected]>wrote:


> ** **
>
> ** **
>
> *From:* [email protected][mailto:
> [email protected]] *On Behalf
> Of *Keith Turner
>
> *Sent:* Tuesday, September 17, 2013 3:20 PM
> *To:* [email protected]
> *Subject:* Re: Dead Tablet Server****
>
> ** **
>
> ** **
>
> ** **
>
> On Tue, Sep 17, 2013 at 10:23 AM, Ott, Charles H. <[email protected]>
> wrote:****
>
> Forgive my ignorance with this, But I have not yet had a tablet failure
> that I have been able to recover without restarting the entire accumulo
> cluster.****
>
>  ****
>
> I have 3 Tablets, 2 Online, 1 dead.  Using Accumulo 1.4.3****
>
>  ****
>
> The tablet error reports:****
>
> Uncaught exception in TabletServer.main, exiting****
>
>          java.lang.RuntimeException: java.lang.RuntimeException: Too many
> retries, exiting.****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684)
> ****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703)
> ****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168)
> ****
>
>                  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)****
>
>                  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> ****
>
>                  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> ****
>
>                  at java.lang.reflect.Method.invoke(Method.java:597)****
>
>                  at org.apache.accumulo.start.Main$1.run(Main.java:89)****
>
>                  at java.lang.Thread.run(Thread.java:662)****
>
>          Caused by: java.lang.RuntimeException: Too many retries, exiting.
> ****
>
>                  at
> org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681)
> ****
>
>                  ... 8 more****
>
>  ****
>
>  ****
>
> ** **
>
> It would be nice to add this stack trace as a comment on ACCUMULO-1277 to
> make it easier to find via google.  Would you like to do this?  If not I
> can.****
>
>  ****
>
>                 I just added it to the comments :
> https://issues.apache.org/jira/browse/ACCUMULO-1277
>


thanks.  I edited the comment to add jira {noformat} makup around the stack
trace.


> ****
>
> The recovery portion of the Admin guide says that recovery is performed by
> asking the loggers to copy their write-ahead logs into HDFS.  The logs are
> copied, sorted and then tablets can find missing updates.  Once complete
> the tablets involved should return to an ‘online’ state.****
>
>  ****
>
> I am not sure how to ask the loggers to copy their write-ahead logs into
> hdfs.  Is this the same as using the flush shell command?  If so, the flush
> command needs a pattern of tables or a table name.  Would I want to perform
> something like, ‘accumulo flush -p .+’ to flush all of the table data to
> HDFS?****
>
>  ****
>
> Another concern is that the Tablet Server process was no longer running on
> the server.  I logged into that server and ran “start-here.sh”.  The tablet
> server is now running, but it is still reported as ‘dead’ to the monitor.
> ****
>
>  ****
>
> Thanks in advance,****
>
> Charles****
>
> ** **
>

Re: Dead Tablet Server

Reply via email to