Sijie,
The command you had asked to run doesn't work. Pavan (adding him to this
thread) tried running it with "--ledgeridformat" as well, but that didn't work
either.
./bookkeeper shell ledger -ledgeridformat long -m 36
08:31:31,924 ERROR Error parsing command line arguments :
org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option:
-ledgeridformat
at org.apache.commons.cli.Parser.processOption(Parser.java:363)
at org.apache.commons.cli.Parser.parse(Parser.java:199)
at org.apache.commons.cli.Parser.parse(Parser.java:85)
at
org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:231)
at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2816)
at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2907)
ledger: Dump ledger index entries into readable format.
usage: ledger [-m] <ledger_id>
________________________________
From: Belgundi, Prajakta <[email protected]>
Sent: Tuesday, December 3, 2019 2:03 PM
To: [email protected] <[email protected]>
Cc: Enrico Olivelli <[email protected]>; Flavio Junqueira <[email protected]>;
Sharda, Ravi <[email protected]>
Subject: RE: Bookeeper exception on pods restart
Sorry to digress a little bit from existing conversations here…..
But this issue is almost always noticed on bookie restart ….so wanted to
understand if this problem could be the result of unclean bookie shutdown….
In which case, what is the way to ensure a graceful termination of bookies, so
we don’t lose/corrupt any data?
-Thanks,
Prajakta
From: Sijie Guo <[email protected]>
Sent: Tuesday, December 3, 2019 1:53 PM
To: Sharda, Ravi
Cc: Belgundi, Prajakta; Enrico Olivelli; user; Flavio Junqueira
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
I mean the error "ERROR: invalid ledger id 56" is raised due to using a wrong
ledger id formatter. I was suggesting you rerunning the command to collect more
information so that we can debug.
- Sijie
On Tue, Dec 3, 2019 at 12:17 AM Sharda, Ravi
<[email protected]<mailto:[email protected]>> wrote:
Did you mean we should run this on a running environment to recover from the
failure?
“bin/bookkeeper shell ledger -ledgeridformat long -m [ledger-id]"
________________________________
From: Sijie Guo <[email protected]<mailto:[email protected]>>
Sent: Tuesday, December 3, 2019 1:10 PM
To: Sharda, Ravi <[email protected]<mailto:[email protected]>>
Cc: Enrico Olivelli <[email protected]<mailto:[email protected]>>; user
<[email protected]<mailto:[email protected]>>; Flavio
Junqueira <[email protected]<mailto:[email protected]>>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
I think 4.7.2 is using UUID as the ledger id formatter by default (it was a
mistake, and reverted in its subsequent releases).
So you might have to run “bin/bookkeeper shell ledger -ledgeridformat long -m
[ledger-id]".
Can you rerun this command again?
---
> I saw this occurring for several ledgers in this environment.
The IOException might be related to disk issues. Although I don't have enough
information to tell.
Thanks,
Sijie
On Mon, Dec 2, 2019 at 7:24 AM Sharda, Ravi
<[email protected]<mailto:[email protected]>> wrote:
Enrico,
I saw this occurring for several ledgers in this environment.
-----------
ERROR - [BookieReadThreadPool-OrderedExecutor-3-0:ReadEntryProcessorV3@235] -
IOException while reading entry: 5 from ledger 1243
[BookieReadThreadPool-OrderedExecutor-7-0:ReadEntryProcessorV3@235] -
IOException while reading entry: 15 from ledger 1239
ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] -
IOException while reading entry: 102 from ledger 64
ERROR - [BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] -
IOException while reading entry: 7 from ledger 728
________________________________
From: Enrico Olivelli <[email protected]<mailto:[email protected]>>
Sent: Monday, December 2, 2019 7:01 PM
To: user <[email protected]<mailto:[email protected]>>
Cc: Sijie Guo <[email protected]<mailto:[email protected]>>; Flavio Junqueira
<[email protected]<mailto:[email protected]>>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
sh-4.2# ./bookkeeper shell ledger -m 56
ERROR: invalid ledger id 56
ledger: Dump ledger index entries into readable format.
usage: ledger [-m] <ledger_id>
-m,--meta Print meta information
Does it work for other ledgers ?
Enrico
Il giorno lun 2 dic 2019 alle ore 10:06 Sharda, Ravi
<[email protected]<mailto:[email protected]>> ha scritto:
Hello Sijie,
Any luck with this? Please let us know what could be going wrong.
Thanks & best regards,
Ravi
________________________________
From: Sharda, Ravi <[email protected]<mailto:[email protected]>>
Sent: Friday, November 29, 2019 3:28 PM
To: Sijie Guo <[email protected]<mailto:[email protected]>>
Cc: user <[email protected]<mailto:[email protected]>>;
Flavio Junqueira <[email protected]<mailto:[email protected]>>
Subject: Re: Bookeeper exception on pods restart
Thanks. Here's the output of the command:
sh-4.2# ./bookkeeper shell ledger -m 56
ERROR: invalid ledger id 56
ledger: Dump ledger index entries into readable format.
usage: ledger [-m] <ledger_id>
-m,--meta Print meta information
________________________________
From: Sijie Guo <[email protected]<mailto:[email protected]>>
Sent: Friday, November 29, 2019 3:15 PM
To: Sharda, Ravi <[email protected]<mailto:[email protected]>>
Cc: user <[email protected]<mailto:[email protected]>>;
Flavio Junqueira <[email protected]<mailto:[email protected]>>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
Sorry, my bad. The command for reading ledger index should be "bookkeeper shell
ledger".
>From the `ls` output, I didn't find entry 1.log under ledgers directory. So I
>guess the log file doesn't exist. If you can provide the output of `bookkeeper
>shell ledger`, we can take a look at the index file to understand more.
On Fri, Nov 29, 2019 at 1:20 AM Sharda, Ravi
<[email protected]<mailto:[email protected]>> wrote:
For the following error,
OrderedExecutor-0-0:ReadEntryProcessorV3@235] - IOException while reading
entry: 25 from ledger 56
java.io<http://java.io/>.FileNotFoundException: No file for log 1 for 56 with
location 4744138143
at org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:1165)
at
org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:1100)
at
org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:1002)
at org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:1051)
at
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.getEntry(InterleavedLedgerStorage.java:305)
at
org.apache.bookkeeper.bookie.SortedLedgerStorage.getEntry(SortedLedgerStorage.java:153)
at org.apache.bookkeeper.bookie.L
-----------
sh-4.2# ./bookkeeper shell readledger --ledgerid 56
ERROR: invalid value for option ledgerid : 56
Must specify a ledger id
-----------
I didn't know how to check that the log file exists. Attaching the output of
"ls -R -L", instead.
________________________________
From: Sijie Guo <[email protected]<mailto:[email protected]>>
Sent: Friday, November 29, 2019 2:31 PM
To: Sharda, Ravi <[email protected]<mailto:[email protected]>>
Cc: user <[email protected]<mailto:[email protected]>>;
Flavio Junqueira <[email protected]<mailto:[email protected]>>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
If it is a permanent error,
- check if the log file (indicated in the error message) exists or not.
- use `bookkeeper shell readledger` to dump the index of the given ledger.
- see if the index points to the right entry log file or not
- Sijie
On Fri, Nov 29, 2019 at 12:46 AM Sharda, Ravi
<[email protected]<mailto:[email protected]>> wrote:
The latest instance we have seen is a permanent error. The bookies haven't
recovered in the environment (last 2 days). In some previous instances,
developers had reported that the bookies had recovered, but it is also possible
that the error was slightly different from what we are seeing now.
Thanks & best regards,
Ravi
________________________________
From: Sijie Guo <[email protected]<mailto:[email protected]>>
Sent: Friday, November 29, 2019 2:10 PM
To: user <[email protected]<mailto:[email protected]>>
Cc: Flavio Junqueira <[email protected]<mailto:[email protected]>>; Sharda, Ravi
<[email protected]<mailto:[email protected]>>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
Sorry for jumping into the discussion. But the error message indicates that the
entry log file 1 is not found.
It seems to me that entry log file was removed but the entry index still points
to the old location. Is this error transient error or a permanent error?
- Sijie
On Fri, Nov 29, 2019 at 12:11 AM
<[email protected]<mailto:[email protected]>> wrote:
+ Ravi, who will be looking into this ….
From: Enrico Olivelli - Diennea
<[email protected]<mailto:[email protected]>>
Sent: Thursday, November 28, 2019 7:00 PM
To: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
>From the error it looks like one client is trying to read an entry from the
>Bookie but the entry is not there.
I see two reasons:
1) The write never reached the bookie
2) The bookie is missing some file
For 1)
Do you have logs on the writer ? something that could tell us that a write did
not succeed ?
How old is supposed to be the entry ?
Do you have logs on the reader that is trying to read the entry ?
For 2)
Do you have other errors in the logs about failed writes or whatever ?
If you were on 4.9 we could use the ‘localconsistency checker’ and check for
inconsistency on the bookie, it scans the bookie looking for every entry that
should reside on the bookie itsself.
If you were writing your ledgers with writequorum >= 2 maybe you can recover
your data.
In order to debug the problem we should compare the logs of:
* The bookie
* The writer
* The reader
Enrico
Il giorno 28/11/19, 13:59
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>> ha scritto:
I understand that auto recovery would replicate data for under replicated
ledgers.
But it is scheduled to run only once in a while and may not have run before a
reader tries to read this data from a certain bookie.
Generally what does below exception indicate about the state of BK?
Does it indicate that the entry is missing on the specific bookie and so we
don’t find it?
Or that something in the ledger metadata or ledgers could have been corrupted??
Found the same issue with another product where you seem to have provided a
custom fix:
https://github.com/diennea/herddb/issues/194
All in all want to understand if this can be the result of BK misconfiguration
or is just a temporary unavailability problem that will resolve itself when
auto-replication runs??
-Thanks,
Prajakta
From: Enrico Olivelli - Diennea
<[email protected]<mailto:[email protected]>>
Sent: Thursday, November 28, 2019 6:11 PM
To: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
I don’t think there is a good value.
You can use WriteQuorumSize = AckQuorumSize, this way you will see an error on
the writing client in case of write failure to any of the bookies
Usually you are enabling the Autorecovery feature to fill in the gaps of
underreplicated ledgers:
http://bookkeeper.apache.org/docs/4.10.0/admin/autorecovery/
Hope that helps
Enrico
Il giorno 28/11/19, 13:30
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>> ha scritto:
What EnsembleSize, WriteQuorumSize and AckQuorumSize would you recommend, so we
never see this?
What other ledger creation parameters do you need information about?
-Thanks,
Prajakta
From: Enrico Olivelli - Diennea
<[email protected]<mailto:[email protected]>>
Sent: Thursday, November 28, 2019 5:19 PM
To: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: Bookeeper exception on pods restart
[EXTERNAL EMAIL]
Hi Prajakta,
What ledger creation parameters are you using ? Ensamble size, Write quorum
size, Ack quorum size ?
If ackQuorumSize < WriteQuorumSize it is possible that a write to the bookie
failed and even if the entry is supposed to be on the bookie it never reached
it but the overall single write succeeded because a writequorum of bookies
acknowledged the write.
Enrico
Il giorno 28/11/19, 12:44
"[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>> ha scritto:
Hello Team,
We have a question about an issue we are running into with Bookeeper.
We use bookkeeper version 4.7.3.
This issue occurs occasionally when Bookkeeper servers are restarted.
We see the following error in the logs for some time, which blocks Pravega's
operations for the same duration. Not knowing the internals of Bookeeper, but
just based on the exception alone, it seems like Bookeeper might not be locate
the files temporarily. What could be causing this?
2019-11-28 03:52:26,491 - ERROR -
[BookieReadThreadPool-OrderedExecutor-0-0:ReadEntryProcessorV3@235] -
IOException while reading entry: 25 from ledger 56
java.io<https://slack-redir.net/link?url=http%3A%2F%2Fjava.io>.FileNotFoundException:
No file for log 1 for 56 with location 4744138143
at
org.apache.bookkeeper.bookie.EntryLogger.findFile(EntryLogger.java:1165)
at
org.apache.bookkeeper.bookie.EntryLogger.getChannelForLogId(EntryLogger.java:1100)
at
org.apache.bookkeeper.bookie.EntryLogger.internalReadEntry(EntryLogger.java:1002)
at
org.apache.bookkeeper.bookie.EntryLogger.readEntry(EntryLogger.java:1051)
at
org.apache.bookkeeper.bookie.InterleavedLedgerStorage.getEntry(InterleavedLedgerStorage.java:305)
at
org.apache.bookkeeper.bookie.SortedLedgerStorage.getEntry(SortedLedgerStorage.java:153)
at
org.apache.bookkeeper.bookie.LedgerDescriptorImpl.readEntry(LedgerDescriptorImpl.java:153)
at org.apache.bookkeeper.bookie.Bookie.readEntry(Bookie.java:1305)
at
org.apache.bookkeeper.proto.ReadEntryProcessorV3.readEntry(ReadEntryProcessorV3.java:175)
at
org.apache.bookkeeper.proto.ReadEntryProcessorV3.readEntry(ReadEntryProcessorV3.java:155)
at
org.apache.bookkeeper.proto.ReadEntryProcessorV3.getReadResponse(ReadEntryProcessorV3.java:218)
at
org.apache.bookkeeper.proto.ReadEntryProcessorV3.executeOp(ReadEntryProcessorV3.java:264)
at
org.apache.bookkeeper.proto.ReadEntryProcessorV3.safeRun(ReadEntryProcessorV3.java:260)
at
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
-Thanks,
Prajakta
________________________________
CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also
contain privileged information. If you are not the intended recipient you are
not authorised to read, print, save, process or disclose this message. If you
have received this message by mistake, please inform the sender immediately and
destroy this e-mail, its attachments and any copies. Any use, distribution,
reproduction or disclosure by any person other than the intended recipient is
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee
that the correspondence towards this e-mail will be read only by the recipient,
because, under certain circumstances, there may be a need to access this email
by third subjects belonging to the Company.
________________________________
CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also
contain privileged information. If you are not the intended recipient you are
not authorised to read, print, save, process or disclose this message. If you
have received this message by mistake, please inform the sender immediately and
destroy this e-mail, its attachments and any copies. Any use, distribution,
reproduction or disclosure by any person other than the intended recipient is
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee
that the correspondence towards this e-mail will be read only by the recipient,
because, under certain circumstances, there may be a need to access this email
by third subjects belonging to the Company.
________________________________
CONFIDENTIALITY & PRIVACY NOTICE
This e-mail (including any attachments) is strictly confidential and may also
contain privileged information. If you are not the intended recipient you are
not authorised to read, print, save, process or disclose this message. If you
have received this message by mistake, please inform the sender immediately and
destroy this e-mail, its attachments and any copies. Any use, distribution,
reproduction or disclosure by any person other than the intended recipient is
strictly prohibited and the person responsible may incur in penalties.
The use of this e-mail is only for professional purposes; there is no guarantee
that the correspondence towards this e-mail will be read only by the recipient,
because, under certain circumstances, there may be a need to access this email
by third subjects belonging to the Company.