On Tue, Sep 17, 2013 at 10:23 AM, Ott, Charles H. <[email protected]>wrote:
> Forgive my ignorance with this, But I have not yet had a tablet failure > that I have been able to recover without restarting the entire accumulo > cluster.**** > > ** ** > > I have 3 Tablets, 2 Online, 1 dead. Using Accumulo 1.4.3**** > > ** ** > > The tablet error reports:**** > > Uncaught exception in TabletServer.main, exiting**** > > java.lang.RuntimeException: java.lang.RuntimeException: Too many > retries, exiting.**** > > at > org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2684) > **** > > at > org.apache.accumulo.server.tabletserver.TabletServer.run(TabletServer.java:2703) > **** > > at > org.apache.accumulo.server.tabletserver.TabletServer.main(TabletServer.java:3168) > **** > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method)**** > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > **** > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > **** > > at java.lang.reflect.Method.invoke(Method.java:597)**** > > at org.apache.accumulo.start.Main$1.run(Main.java:89)**** > > at java.lang.Thread.run(Thread.java:662)**** > > Caused by: java.lang.RuntimeException: Too many retries, exiting. > **** > > at > org.apache.accumulo.server.tabletserver.TabletServer.announceExistence(TabletServer.java:2681) > **** > > ... 8 more**** > > ** ** > > ** > It would be nice to add this stack trace as a comment on ACCUMULO-1277 to make it easier to find via google. Would you like to do this? If not I can. > ** > > The recovery portion of the Admin guide says that recovery is performed by > asking the loggers to copy their write-ahead logs into HDFS. The logs are > copied, sorted and then tablets can find missing updates. Once complete > the tablets involved should return to an ‘online’ state.**** > > ** ** > > I am not sure how to ask the loggers to copy their write-ahead logs into > hdfs. Is this the same as using the flush shell command? If so, the flush > command needs a pattern of tables or a table name. Would I want to perform > something like, ‘accumulo flush -p .+’ to flush all of the table data to > HDFS?**** > > ** ** > > Another concern is that the Tablet Server process was no longer running on > the server. I logged into that server and ran “start-here.sh”. The tablet > server is now running, but it is still reported as ‘dead’ to the monitor. > **** > > ** ** > > Thanks in advance,**** > > Charles**** >
