The code referenced in the PR works to detect and move a WAL, replacing it with an empty one, but isn't fully wrapped up/merged. Some priorities were shifted and this got pushed back, though I do plan on addressing the comments in the code review Soon™.
I'd suggest upgrading to 1.9.2 once you resolve the issue. We've been running it for a while and have not had any WAL-related errors. --Adam On Tue, Aug 21, 2018 at 6:58 PM Ed Coleman <d...@etcoleman.com> wrote: > The has been work done in https://github.com/apache/accumulo/pull/574. > I'm not certain of the state of the code, but the description may provide > you with things that you could look at manually. > > > -----Original Message----- > From: tech.s...@gmail.com [mailto:tech.s...@gmail.com] > Sent: Tuesday, August 21, 2018 5:45 PM > To: user@accumulo.apache.org > Subject: Re: Corrupt WAL > > Was there any success with this workaround strategy? I am also > experiencing this issue. > > On 2018/06/13 16:30:22, "Adam J. Shook" <adamjsh...@gmail.com> wrote: > > Sorry, I had the error backwards. There is an OPEN for the WAL and > > then immediately a COMPACTION_FINISH entry. This would cause the error. > > > > On Wed, Jun 13, 2018 at 11:34 AM, Adam J. Shook <adamjsh...@gmail.com> > > wrote: > > > > > Looking at the log I see that the last two entries are > > > COMPACTION_START of one RFile immediately followed by a > > > COMPACTION_START of a separate RFile which (I believe) would lead to > > > the error. Would this necessarily be an issue if the compactions are > for separate RFiles? > > > > > > This is a dev cluster and I don't necessarily care about it, but is > > > there a (good) means to do WAL log surgery? I imagine I can just > > > chop off bytes until the log is parseable and missing the info about > the compactions. > > > > > > On Tue, Jun 12, 2018 at 2:32 PM, Keith Turner <ke...@deenlo.com> > wrote: > > > > > >> On Tue, Jun 12, 2018 at 12:10 PM, Adam J. Shook > > >> <adamjsh...@gmail.com> > > >> wrote: > > >> > Yes, that is the error. I'll inspect the logs and report back. > > >> > > >> Ok. The LogReader command has a mechanism to filter which tablet > > >> is displayed. If the walog has alot of data in it, may need to > > >> use this. > > >> > > >> Also, be aware that only 5 mutations are shown for a "many mutations" > > >> objects in the walog. The -m options changes this. May want to see > > >> more when deciding if the info in the log is important. > > >> > > >> > > >> > > > >> > On Tue, Jun 12, 2018 at 10:14 AM, Keith Turner <ke...@deenlo.com> > > >> wrote: > > >> >> > > >> >> Is the message you are seeing "COMPACTION_FINISH (without > > >> >> preceding COMPACTION_START)" ? That messages indicates that the > > >> >> WALs are incomplete, probably as a result of the NN problems. > > >> >> Could do the following : > > >> >> > > >> >> 1) Run the following command to see whats in the log. Need to > > >> >> see what is there for the root tablet. > > >> >> > > >> >> accumulo org.apache.accumulo.tserver.logger.LogReader > > >> >> > > >> >> 2) Replace the log file with an empty file after seeing if there > > >> >> is anything important in it. > > >> >> > > >> >> I think the list of WALs for the root tablet is stored in ZK at > > >> >> /accumulo/<id>/walogs > > >> >> > > >> >> On Mon, Jun 11, 2018 at 5:26 PM, Adam J. Shook > > >> >> <adamjsh...@gmail.com> > > >> >> wrote: > > >> >> > Hey all, > > >> >> > > > >> >> > The root tablet on one of our dev systems isn't loading due to > > >> >> > an illegal state exception -- COMPACTION_FINISH preceding > > >> >> > COMPACTION_START. > > >> What'd > > >> >> > be > > >> >> > the best way to mitigate this issue? This was likely caused > > >> >> > due to > > >> both > > >> >> > of > > >> >> > our NameNodes failing. > > >> >> > > > >> >> > Thank you, > > >> >> > --Adam > > >> > > > >> > > > >> > > > > > > > > > >