On Tue, Sep 17, 2013 at 3:32 PM, Ott, Charles H. <[email protected]>wrote:
> ** ** > > ** ** > > *From:* [email protected][mailto: > [email protected]] *On Behalf > Of *Keith Turner > > *Sent:* Tuesday, September 17, 2013 3:20 PM > *To:* [email protected] > *Subject:* Re: Dead Tablet Server**** > > ** ** > > ** ** > > ** ** > > On Tue, Sep 17, 2013 at 10:23 AM, Ott, Charles H. <[email protected]> > wrote:**** > > Forgive my ignorance with this, But I have not yet had a tablet failure > that I have been able to recover without restarting the entire accumulo > cluster.**** > > **** > > I have 3 Tablets, 2 Online, 1 dead. Using Accumulo 1.4.3**** > > **** > > The tablet error reports:**** > > Uncaught exception in TabletServer.main, exiting**** > > java.lang.RuntimeException: java.lang.RuntimeException: Too many > retries, exiting.**** > > at > org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684) > **** > > at > org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703) > **** > > at > org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168) > **** > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method)**** > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > **** > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > **** > > at java.lang.reflect.Method.invoke(Method.java:597)**** > > at org.apache.accumulo.start.Main$1.run(Main.java:89)**** > > at java.lang.Thread.run(Thread.java:662)**** > > Caused by: java.lang.RuntimeException: Too many retries, exiting. > **** > > at > org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681) > **** > > ... 8 more**** > > **** > > **** > > ** ** > > It would be nice to add this stack trace as a comment on ACCUMULO-1277 to > make it easier to find via google. Would you like to do this? If not I > can.**** > > **** > > I just added it to the comments : > https://issues.apache.org/jira/browse/ACCUMULO-1277 > thanks. I edited the comment to add jira {noformat} makup around the stack trace. > **** > > The recovery portion of the Admin guide says that recovery is performed by > asking the loggers to copy their write-ahead logs into HDFS. The logs are > copied, sorted and then tablets can find missing updates. Once complete > the tablets involved should return to an ‘online’ state.**** > > **** > > I am not sure how to ask the loggers to copy their write-ahead logs into > hdfs. Is this the same as using the flush shell command? If so, the flush > command needs a pattern of tables or a table name. Would I want to perform > something like, ‘accumulo flush -p .+’ to flush all of the table data to > HDFS?**** > > **** > > Another concern is that the Tablet Server process was no longer running on > the server. I logged into that server and ran “start-here.sh”. The tablet > server is now running, but it is still reported as ‘dead’ to the monitor. > **** > > **** > > Thanks in advance,**** > > Charles**** > > ** ** >
