Re: Help with bad errors on 4.6.1

2018-03-12 Thread Enrico Olivelli
Il lun 12 mar 2018, 20:40 Ivan Kelly ha scritto: > > It is interesting that the problems is on 'readers' and it seems that the > > PCBC seems corrupted and even writes (if the broker is promoted to > > 'leader') are able to go on after the reads broke the client. > Are writes coming from the same

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Ivan Kelly
> It is interesting that the problems is on 'readers' and it seems that the > PCBC seems corrupted and even writes (if the broker is promoted to > 'leader') are able to go on after the reads broke the client. Are writes coming from the same clients? Or clients in the same process? -Ivan

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Enrico Olivelli
Il lun 12 mar 2018, 19:37 Sijie Guo ha scritto: > Thanks Enrico! > > On Mon, Mar 12, 2018 at 4:21 AM, Enrico Olivelli > wrote: > > > Summary of my findings: > > > > The problem is about clients which get messed up and are not able to read > > and write to bookies after rolling restarts of an app

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Sijie Guo
Thanks Enrico! On Mon, Mar 12, 2018 at 4:21 AM, Enrico Olivelli wrote: > Summary of my findings: > > The problem is about clients which get messed up and are not able to read > and write to bookies after rolling restarts of an application, > the problem appears only on a cluster of 6 machines (r

Re: Release 4.7.0

2018-03-12 Thread Jia Zhai
👍 On Mon, Mar 12, 2018 at 3:19 PM, Enrico Olivelli wrote: > Il lun 12 mar 2018, 07:50 Sijie Guo ha scritto: > > > Updates on release 4.7: > > > > Since Pulsar is switching from using yahoo release to apache release, I > am > > delaying release 4.7.0 for about one month, so we have enough time t

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Ivan Kelly
> - when I "restart" bookies I issue a kill -9 (I think this could be the > reason why I can't reproduce the issue on testcases) With a clean shutdown of bookies we close the channels, and it should do the tcp shutdown handshake. -9 will kill the process before it gets to do any of that, but the ke

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Enrico Olivelli
Summary of my findings: The problem is about clients which get messed up and are not able to read and write to bookies after rolling restarts of an application, the problem appears only on a cluster of 6 machines (reduced to 3 in order to narrow down the search) of my colleagues which are performi

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Enrico Olivelli
I will send a report soon. With new debug I have some finding, I am looking into problems during restarts of bookies. Maybe there is some problem in error handling in PCBC. Thank you Enrico 2018-03-12 10:58 GMT+01:00 Ivan Kelly : > Enrico, could you summarize what the state of things is now? Wha

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Ivan Kelly
Enrico, could you summarize what the state of things is now? What are you running, what problems are you seeing and how are the problems manifesting themselves. Regards, Ivan On Mon, Mar 12, 2018 at 10:15 AM, Enrico Olivelli wrote: > Applyed Sijie's fixes and added some debug: > > Problem is tri

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Enrico Olivelli
Applyed Sijie's fixes and added some debug: Problem is triggered when you restart a bookie (I have a cluster of 3 bookies, WQ = 2 and AQ = 2) Below a new error on client side ("tailing" reader) Enrico this is a new error on client side: 18-03-12-09-11-45Unexpected exception caught by bookie

Re: Release 4.7.0

2018-03-12 Thread Enrico Olivelli
Il lun 12 mar 2018, 07:50 Sijie Guo ha scritto: > Updates on release 4.7: > > Since Pulsar is switching from using yahoo release to apache release, I am > delaying release 4.7.0 for about one month, so we have enough time to test > Pulsar to make sure yahoo changes are ported correctly back in 4.