Looking at the log I see that the last two entries are COMPACTION_START of one RFile immediately followed by a COMPACTION_START of a separate RFile which (I believe) would lead to the error. Would this necessarily be an issue if the compactions are for separate RFiles?
This is a dev cluster and I don't necessarily care about it, but is there a (good) means to do WAL log surgery? I imagine I can just chop off bytes until the log is parseable and missing the info about the compactions. On Tue, Jun 12, 2018 at 2:32 PM, Keith Turner <ke...@deenlo.com> wrote: > On Tue, Jun 12, 2018 at 12:10 PM, Adam J. Shook <adamjsh...@gmail.com> > wrote: > > Yes, that is the error. I'll inspect the logs and report back. > > Ok. The LogReader command has a mechanism to filter which tablet is > displayed. If the walog has alot of data in it, may need to use > this. > > Also, be aware that only 5 mutations are shown for a "many mutations" > objects in the walog. The -m options changes this. May want to see > more when deciding if the info in the log is important. > > > > > > On Tue, Jun 12, 2018 at 10:14 AM, Keith Turner <ke...@deenlo.com> wrote: > >> > >> Is the message you are seeing "COMPACTION_FINISH (without preceding > >> COMPACTION_START)" ? That messages indicates that the WALs are > >> incomplete, probably as a result of the NN problems. Could do the > >> following : > >> > >> 1) Run the following command to see whats in the log. Need to see > >> what is there for the root tablet. > >> > >> accumulo org.apache.accumulo.tserver.logger.LogReader > >> > >> 2) Replace the log file with an empty file after seeing if there is > >> anything important in it. > >> > >> I think the list of WALs for the root tablet is stored in ZK at > >> /accumulo/<id>/walogs > >> > >> On Mon, Jun 11, 2018 at 5:26 PM, Adam J. Shook <adamjsh...@gmail.com> > >> wrote: > >> > Hey all, > >> > > >> > The root tablet on one of our dev systems isn't loading due to an > >> > illegal > >> > state exception -- COMPACTION_FINISH preceding COMPACTION_START. > What'd > >> > be > >> > the best way to mitigate this issue? This was likely caused due to > both > >> > of > >> > our NameNodes failing. > >> > > >> > Thank you, > >> > --Adam > > > > >