Re: Unbounded memory usage for WQ > AQ ?

2021-01-26 Thread Andrey Yegorov
I remember issues with bookies OOMing/slowing down due to memory pressure under load. https://github.com/apache/bookkeeper/issues/1409 https://github.com/apache/bookkeeper/pull/1410 IIRC, there were a couple of problems: - Slow bookie kept on accepting data hat it could not process (netty kept on

Re: Unbounded memory usage for WQ > AQ ?

2021-01-19 Thread Flavio Junqueira
Thanks for the feedback, JV, see comments interspersed: > On 18 Jan 2021, at 22:54, Venkateswara Rao Jujjuri wrote: > > On Mon, Jan 18, 2021 at 10:53 AM Sijie Guo > wrote: > >>> One concern for me in this thread is case (3). I'd expect a client that >> doesn't crash

Re: Unbounded memory usage for WQ > AQ ?

2021-01-19 Thread Flavio Junqueira
> Based on my understanding, Jack wants the behavior on recovering an entry > does not have enough replicas to be deterministic. i.e. If the entry does > not have enough replicas, we can always exclude the entry. Jack, did I get > you right? I see, if that's the case, then part of the problem here

Re: Unbounded memory usage for WQ > AQ ?

2021-01-19 Thread Flavio Junqueira
Thanks for the feedback, Sijie: > On 18 Jan 2021, at 19:53, Sijie Guo wrote: > >> One concern for me in this thread is case (3). I'd expect a client that >> doesn't crash to not give up, and eventually replace the bookie if it is >> unresponsive. >> > The current implementation doesn't retry re

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Venkateswara Rao Jujjuri
On Mon, Jan 18, 2021 at 10:53 AM Sijie Guo wrote: > > One concern for me in this thread is case (3). I'd expect a client that > doesn't crash to not give up, and eventually replace the bookie if it is > unresponsive. > > The current implementation doesn't retry replacing a bookie if an entry is >

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Sijie Guo
On Mon, Jan 18, 2021 at 10:18 AM Flavio Junqueira wrote: > >>> Regarding recovery reads, recovery read doesn't need to be > deterministic. > >>> For the entry with (b1->success, b2->NoSuchLedger, b3->NoSuchLedger), > >>> either including it or excluding it in the sealed ledger is correct > >>> be

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Sijie Guo
> One concern for me in this thread is case (3). I'd expect a client that doesn't crash to not give up, and eventually replace the bookie if it is unresponsive. The current implementation doesn't retry replacing a bookie if an entry is already acknowledged (receiving AQ responses). It relies on in

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Flavio Junqueira
>>> Regarding recovery reads, recovery read doesn't need to be deterministic. >>> For the entry with (b1->success, b2->NoSuchLedger, b3->NoSuchLedger), >>> either including it or excluding it in the sealed ledger is correct >>> behavior. The bookkeeper client guarantees that once a ledger is sealed

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Sijie Guo
Jack, Thank you for your replies! That's good as there are not violations of bookkeeper protocol. Comments inline. On Mon, Jan 18, 2021 at 3:20 AM Jack Vanlightly wrote: > > Did you guys see any issues with the ledger auditor? > > > The active writer can't guarantee it writing entries to WQ be

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Flavio Junqueira
In the scenario that WQ > AQ, a client acknowledges the add of an entry e to the application once it receives AQ bookie acks. Say now that the client is not able to write a copy of e to at least one bookie b, it could be because: 1- The client crashed before it is able to do it 2- Bookie b crash

Re: Unbounded memory usage for WQ > AQ ?

2021-01-18 Thread Jack Vanlightly
> Did you guys see any issues with the ledger auditor? > The active writer can't guarantee it writing entries to WQ because it can > crash during retrying adding entries to (WQ - AQ) bookies. The need to repair AQ replicated entries is clear and the auditor is one such strategy. Ivan has also wor

Re: Unbounded memory usage for WQ > AQ ?

2021-01-17 Thread Sijie Guo
Sorry for being late in this thread. If I understand this correctly, the main topic is about the "hole" when WQ > AQ. > This leaves a "hole" as the entry is now replicated only to 2 bookies, We do have one hole when ensemble change is enabled and WQ > AQ. That was a known behavior. But the hole

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Jack Vanlightly
Let's set up a call and create any issues from that. I have already created the patches in our (Splunk) fork and it might be easiest or not to wait until we re-sync up with the open source repo. We can include the fixes in the discussion. Jack On Fri, Jan 15, 2021 at 4:33 PM Flavio Junqueira wro

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Flavio Junqueira
Hi Jack, Thanks for getting back. > What's the best way to share the TLA+ findings? Would you be able to share the spec? I'm ok with reading TLA+. As for sharing your specific findings, I'd suggest one of the following: 1- Create an email thread describing the scenarios that trigger a bug. 2-

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Jack Vanlightly
Hi Flavio, >> This is an example of a scenario corresponding to what we suspect is a bug introduced earlier, but Enrico is arguing that this is not the intended behavior, and at this point, I agree. >> By the time a successful callback is received, the client might only have replicated AQ ways, s

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Flavio Junqueira
> Let's say we have WQ 3 and AQ 2. An add (e100) has reached AQ and the > confirm callback to the client is called and the LAC is set to 100.Now the > 3rd bookie times out. Ensemble change is executed and all pending adds that > are above the LAC of 100 are replayed to another bookie, meaning that

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Jack Vanlightly
> No you cannot miss data, if the client is not able to find a bookie that is > able to answer with the entry it receives an error. Let's say we have WQ 3 and AQ 2. An add (e100) has reached AQ and the confirm callback to the client is called and the LAC is set to 100. Now the 3rd bookie times out

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Enrico Olivelli
Jonathan, Il giorno gio 14 gen 2021 alle ore 20:57 Jonathan Ellis ha scritto: > On 2021/01/11 08:31:03, Jack Vanlightly wrote: > > Hi, > > > > I've recently modelled the BookKeeper protocol in TLA+ and can confirm > that > > once confirmed, that an entry is not replayed to another bookie. This >

Re: Unbounded memory usage for WQ > AQ ?

2021-01-15 Thread Flavio Junqueira
Right, good catch, Enrico. The issue (#1063) description says: > PendingAddOp:maybeRecycle()->recycle() keeps the buffer until writeComplete() > is called for each bookie write. We need to keep this buffer only until it is > successfully > transferred by netty. In the current code, the write is

Re: Unbounded memory usage for WQ > AQ ?

2021-01-14 Thread Jonathan Ellis
On 2021/01/11 08:31:03, Jack Vanlightly wrote: > Hi, > > I've recently modelled the BookKeeper protocol in TLA+ and can confirm that > once confirmed, that an entry is not replayed to another bookie. This > leaves a "hole" as the entry is now replicated only to 2 bookies, however, > the new data

Re: Unbounded memory usage for WQ > AQ ?

2021-01-14 Thread Enrico Olivelli
Flavio Il giorno gio 14 gen 2021 alle ore 17:56 Flavio Junqueira ha scritto: > Using your example, the PendindAddOp should remain active until there are > 3 copies of the add entry. The client can ack back once it receives two > positive acks from bookies, but it shouldn't declare the add entry

Re: Unbounded memory usage for WQ > AQ ?

2021-01-14 Thread Flavio Junqueira
Using your example, the PendindAddOp should remain active until there are 3 copies of the add entry. The client can ack back once it receives two positive acks from bookies, but it shouldn't declare the add entry done at that point. There is the case that the third bookie is slow, but it could

Re: Unbounded memory usage for WQ > AQ ?

2021-01-13 Thread Enrico Olivelli
Il giorno mer 13 gen 2021 alle ore 17:05 Flavio Junqueira ha scritto: > We should work on some kind of back-pressure mechanism for the client, but > I am not sure about which kind of support we should provide at BK level > > > Is there an issue for this? If there isn't, then perhaps we can start

Re: Unbounded memory usage for WQ > AQ ?

2021-01-13 Thread Flavio Junqueira
> We should work on some kind of back-pressure mechanism for the client, but I > am not sure about which kind of support we should provide at BK level Is there an issue for this? If there isn't, then perhaps we can start that way. > And as soon as the application is notified of the result of the

Re: Unbounded memory usage for WQ > AQ ?

2021-01-13 Thread Enrico Olivelli
Flavio Il giorno mar 12 gen 2021 alle ore 17:26 Flavio Junqueira ha scritto: > I have observed the issue that Matteo describes and I also attributed the > problem to the absence of a back pressure mechanism in the client. Issue > #2497 was not about that, though. There was some corruption going

Re: Unbounded memory usage for WQ > AQ ?

2021-01-12 Thread Flavio Junqueira
I have observed the issue that Matteo describes and I also attributed the problem to the absence of a back pressure mechanism in the client. Issue #2497 was not about that, though. There was some corruption going on that was leading to the server receiving garbage. -Flavio > On 8 Jan 2021, at

Re: Unbounded memory usage for WQ > AQ ?

2021-01-12 Thread Flavio Junqueira
Hi Jack, > I've recently modelled the BookKeeper protocol in TLA+ and can confirm that > once confirmed, that an entry is not replayed to another bookie. Should I assume that you modeled it after the code? Otherwise, what did you use as a reference? Is the TLA+ spec available anywhere? It sounds

Re: Unbounded memory usage for WQ > AQ ?

2021-01-12 Thread Enrico Olivelli
Il giorno lun 11 gen 2021 alle ore 18:14 Venkateswara Rao Jujjuri < jujj...@gmail.com> ha scritto: > > new data integrity check that Ivan worked on > The current auditor should take care of this if > "auditorLedgerVerificationPercentage" is set to 100%. > I don't think this is the most efficient w

Re: Unbounded memory usage for WQ > AQ ?

2021-01-11 Thread Venkateswara Rao Jujjuri
> new data integrity check that Ivan worked on The current auditor should take care of this if "auditorLedgerVerificationPercentage" is set to 100%. I don't think this is the most efficient way, but I believe it does take care of filling holes. On Mon, Jan 11, 2021 at 12:31 AM Jack Vanlightly wro

Re: Unbounded memory usage for WQ > AQ ?

2021-01-11 Thread Jack Vanlightly
Hi, I've recently modelled the BookKeeper protocol in TLA+ and can confirm that once confirmed, that an entry is not replayed to another bookie. This leaves a "hole" as the entry is now replicated only to 2 bookies, however, the new data integrity check that Ivan worked on, when run periodically w

Re: Unbounded memory usage for WQ > AQ ?

2021-01-08 Thread Venkateswara Rao Jujjuri
On Fri, Jan 8, 2021 at 2:29 PM Matteo Merli wrote: > On Fri, Jan 8, 2021 at 2:15 PM Venkateswara Rao Jujjuri > wrote: > > > > > otherwise the write will timeout internally and it will get replayed > to a > > new bookie. > > If Qa is met and the writes of Qw-Qa fail after we send the success to >

Re: Unbounded memory usage for WQ > AQ ?

2021-01-08 Thread Matteo Merli
On Fri, Jan 8, 2021 at 2:15 PM Venkateswara Rao Jujjuri wrote: > > > otherwise the write will timeout internally and it will get replayed to a > new bookie. > If Qa is met and the writes of Qw-Qa fail after we send the success to the > client, why would the write replayed on a new bookie? I think

Re: Unbounded memory usage for WQ > AQ ?

2021-01-08 Thread Venkateswara Rao Jujjuri
> otherwise the write will timeout internally and it will get replayed to a new bookie. If Qa is met and the writes of Qw-Qa fail after we send the success to the client, why would the write replayed on a new bookie? On Fri, Jan 8, 2021 at 1:47 PM Matteo Merli wrote: > On Fri, Jan 8, 2021 at 8:2

Re: Unbounded memory usage for WQ > AQ ?

2021-01-08 Thread Matteo Merli
On Fri, Jan 8, 2021 at 8:27 AM Enrico Olivelli wrote: > > Hi Matteo, > in this comment you are talking about an issue you saw when WQ is greater > that AQ > https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246 > > IIUC you are saying that if one bookie is slow the client contin

Unbounded memory usage for WQ > AQ ?

2021-01-08 Thread Enrico Olivelli
Hi Matteo, in this comment you are talking about an issue you saw when WQ is greater that AQ https://github.com/apache/bookkeeper/issues/2497#issuecomment-734423246 IIUC you are saying that if one bookie is slow the client continues to accumulate references to the entries that still have not recei