Re: Improve the process of removing bookies from a cluster

2021-09-09 Thread Ivan Kelly
> Do you think it will be different enough from the > autorecovery process to put it on the bookie being drained or should it > still reside within the autorecovery process? I think there's enough commonality that a single solution can be applied to both. At root, both have to copy entries and upda

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Ivan Kelly
Hi Yang, > Besides the auditor, I think the external operator (whether a human > operator or an automation program) also cares about the "draining" state of > a bookie. This isn't a question of the internal model, but of how it is exposed. API-wise, it would not be a problem to expose draining as

Re: Improve the process of removing bookies from a cluster

2021-09-08 Thread Ivan Kelly
> I am not very familiar with bookkeeper and auditor history (so please let > me know if this understanding doesn't work), but it seems to me that the > process responsible for draining the bookie could be local to the bookie > itself to limit network hops. This is a very good point. One of the re

Re: Improve the process of removing bookies from a cluster

2021-09-07 Thread Ivan Kelly
Hi Yang, > Autoscaling is exactly one motivation for me to bring this topic up. I > understand that the auto-recovery is not perfect at the moment, but it's an > important component that maintains the core invariants of a bookkeeper > cluster, so I think we may keep improving it until we find a be

Re: Decommission command without stopping bookkeeper server

2021-09-07 Thread Ivan Kelly
This seems to be part of another thread "Improve the process of removing bookies from a cluster". Can you please respond there to keep the conversation in one place? -Ivan On Tue, Sep 7, 2021 at 12:57 PM zhangao wrote: > > Hello everyone,Now, decommission command need bookie stoped, and onl

Re: Improve the process of removing bookies from a cluster

2021-09-06 Thread Ivan Kelly
Hi Yang, This is something we've been thinking about internally. It's especially important if we want to implement auto scaling for bookies. I'm not sure we need a "draining" state as such. Or at least, the draining state doesn't need to be at the same level as "read-only". "draining" is only int

Re: Skip writes to the journal part 2 - BP-44

2021-08-19 Thread Ivan Kelly
Hi Enrico, I meant to reply yesterday but forgot. There's still one change we need to make before pushing this code up. The change is fairly small, so it shouldn't take too long. Cheers, Ivan On Wed, Aug 18, 2021 at 1:13 PM Enrico Olivelli wrote: > > Ivan and Jack, > we have recently committed

Re: Docker images and security vunerabilities

2021-08-12 Thread Ivan Kelly
How did it end up on centos in the first place? +1 for moving to ubuntu. -Ivan On Thu, Aug 12, 2021 at 1:03 PM Enrico Olivelli wrote: > > Hello folks, > I have found a PR [1] that is about upgrading the base image to Centos 8 > because the Centos 7 image has some reported vulnerabilities. > > I

Re: Contributing Splunk changes back to OSS

2020-11-11 Thread Ivan Kelly
> There might be slight overlap with BP-40 which I have in the works for > Audit logging for BKShell and Bkctl. Where is BP-40? I don't see it on the dev list. -Ivan

Re: Contributing Splunk changes back to OSS

2020-11-11 Thread Ivan Kelly
> Based on feedback on this, I'd also like to later start a similar > Gradle proposal for Pulsar builds too. Yes, but let's wait and see how it goes here first. I'll start putting together a BP today. -Ivan

Re: Contributing Splunk changes back to OSS

2020-11-10 Thread Ivan Kelly
> > What are peoples opinions on moving BookKeeper to gradle (assuming > > I/splunk do the legwork)? > > If people are open to it, I'll submit a BP. > > > > +1. My only question is how do you do an Apache release. I'd like to see BP > covering that question. Yes, this will need a BP to cover all t

Contributing Splunk changes back to OSS

2020-11-10 Thread Ivan Kelly
Hi folks, It's been about a year since Streamlio joined Splunk and since then we've had a bit of forking with our BK branch. It has gotten to a stage where it's starting to be a problem for us, so we'd like to start to get things back in sync. There are a couple of big chunks of work to come back

Re: Bypassing writes to the Journal - everybody wants this feature !

2020-11-10 Thread Ivan Kelly
> The limit of the given patches is that it is simply skipping all of the > writes to the journal, and this in turn is a big problem: > - if you restart the bookie it is likely that you lose your data, and > especially the 'fenced' flag > - clients cannot rely on most of the guarantees that BK pro

Pushed branch to apache/bookkeeper by mistake

2020-03-04 Thread Ivan Kelly
Hi folks, I pushed a branch to the wrong repo by mistake. I meant to push to my fork. Deleted now. Please ignore, and sorry for the noise. -Ivan

Re: Log truncation and sync up when bookie fails and rejoins

2020-01-28 Thread Ivan Kelly
> Thanks for the detailed response. Just one question, if writer doesn't > fail, but bookie write fails (Say a soft failure because of network problem > or GC pause), the writer will create a new fragment within a ledger. So the > same sequence of operations that happen while closing the ledger nee

Re: Log truncation and sync up when bookie fails and rejoins

2020-01-28 Thread Ivan Kelly
> So.. log truncation, the way it's needed in leader based systems like RAFT > and Kafka, where leader may have entries appended to its log which are not > replicated. If leader crashes before replicating entries, which will elect > other node as leader. Once the previous leader rejoins the cluster

Re: Log truncation and sync up when bookie fails and rejoins

2020-01-28 Thread Ivan Kelly
> From the bookie perspective, if a bookie of a ledger ensemble crashes while a > ledger is being written to, then it is replaced and the history of the ledger > is updated in the ledger metadata according to the last add confirmed by the > crashed bookie. If the bookie crashes after the ledger

Re: Log truncation and sync up when bookie fails and rejoins

2020-01-28 Thread Ivan Kelly
> But who takes care of updating a particular Bookie in case it crashses (or > temporarily partitioned) and rejoins the cluster? Autorecovery takes care of this. The metadata describes the entries that should exist on a bookie. If this doesn't match what actually exists on the bookie, autorecovery

Re: Still Failing: apache/bookkeeper#4573 (master - f89e3fb)

2019-06-03 Thread Ivan Kelly
Actually, I can't merge master into it because it's sijie's branch. Sijie will have to do so. On Mon, Jun 3, 2019 at 1:51 PM Ivan Kelly wrote: > > Master has been failing for a long time. A failing PR was merged in > https://github.com/apache/bookkeeper/pull/2066 >

Re: Still Failing: apache/bookkeeper#4573 (master - f89e3fb)

2019-06-03 Thread Ivan Kelly
t; pass in any long offset and the file handler should return EOF immediately > when trying to read it. However it doesn't seem to be working as expected. > > ### Changes > > Updated `Journal#setLastLogMark()` method to accept an `scanOffset` instead > of constant `LONG.MAX_VALUE

Re: Changing ledger metadata to binary format

2019-06-03 Thread Ivan Kelly
On Thu, May 30, 2019 at 12:15 AM Venkateswara Rao Jujjuri wrote: > > > "Let's decide this when we need it" > > Right. This is the time as we are trying to add fault domain info to the > cookie. When you say cookie, you mean under /ledgers/cookies? That falls outside the scope of binary metadata.

Re: Changing ledger metadata to binary format

2019-05-29 Thread Ivan Kelly
> What is our plan to move forward with binary format? I've not plans regarding it. Moving forward with it will happen when someone comes with a metadata change which will break text metadata users (i.e. almost any metadata change). > Anyone using binary format in production? even for new cluster

Re: [VOTE] Release 4.9.1, release candidate #0

2019-04-04 Thread Ivan Kelly
Sorry about the delay on getting to this, Enrico. +1 (binding) ✓ tag matches sha ✓ check sha512 ✓ check gpg ✓ test licenses ✓ rat check ✓ compile -src ✓ spotbugs ✓ build tag ✓ compiled tag matches -src ✓ maven artifacts sigs are correct ✓ unit tests ✓ integration tests A few minor no

[ANNOUNCE] Apache BookKeeper 4.8.2 released

2019-04-02 Thread Ivan Kelly
The Apache BookKeeper team is proud to announce Apache BookKeeper version 4.8.2. Apache BookKeeper is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads. It has been used for a fundamental service to build reliable services. It is also the log segment sto

Re: Long poll LAC returning immediately on fenced

2019-03-29 Thread Ivan Kelly
> Does the LAC change on that bookie if ledger is fenced? It can do. The recovering client may write entries to that bookie which hadn't originally arrived, but had arrived at another bookie. -Ivan

Long poll LAC returning immediately on fenced

2019-03-29 Thread Ivan Kelly
Hi folks, I'm seeing a problem where a bookie is getting hammer by long poll requests. ~8000rps. This seems to be happening because the long poll logic doesn't wait if the ledger is in fenced state [1], but returns immediately. So the client ends up in a tight loop if the ledger has entered fenced

Re: Tags, version numbers and docker

2019-03-28 Thread Ivan Kelly
. This rule would not be used after that point. -Ivan [1] https://github.com/ivankelly/bookkeeper/commit/e247ef705f055706604ba2f862c1006a8cf817e9 On Wed, Mar 27, 2019 at 10:52 AM Ivan Kelly wrote: > > > That’s a known issue. The auto build is controlled by ASF. We have > > discussed

Re: Tags, version numbers and docker

2019-03-27 Thread Ivan Kelly
> That’s a known issue. The auto build is controlled by ASF. We have > discussed that before and came up the conclusion of current approach. There > is a BP to move dockerfile to a different repo. It just need someone to > complete the BP. This was a known issue a year ago. Nothing has moved on it

Tags, version numbers and docker

2019-03-26 Thread Ivan Kelly
Hi folks, Looking at doing the final tasks for the 4.8.2 release and stuck on the docker bit. It's not that I don't see what has been done before, but more that what is there is so so wrong. Take 4.8.1 release for example. The tarball for that release was cut from b4a2b1, yet the tag for that rel

Re: [VOTE] Release 4.8.2, release candidate #0

2019-03-14 Thread Ivan Kelly
With 3 +1 (binding), [Sijie, enrico and me], and no -1 the vote passes. I'll follow up the rest of the process later today. Thanks folks, Ivan On Thu, Mar 14, 2019 at 11:59 AM Ivan Kelly wrote: > > +1 (binding from me too) > > - Licenses are good > - rat, unit tests pa

Re: [VOTE] Release 4.8.2, release candidate #0

2019-03-14 Thread Ivan Kelly
ritto: > > > > +1 (binding) > > > > - verified source & binary package > > - asc & sha512 are good > > - artifacts are good > > - tag is good > > > > On Sat, Mar 9, 2019 at 2:28 AM Ivan Kelly wrote: > > > > > Hi everyone,

Re: Cutting 4.9.1

2019-03-13 Thread Ivan Kelly
+1 from me. On Wed, Mar 13, 2019 at 10:16 AM Enrico Olivelli wrote: > > Hi guys, > I need to cut 4.9.1 because we are seeing very often this issue > https://github.com/apache/bookkeeper/commit/25c7506c0513351c533db643cb10c953d1e6d0b7 > > Please tag any issue you want to merge into 4.9 branch. > >

[VOTE] Release 4.8.2, release candidate #0

2019-03-08 Thread Ivan Kelly
Hi everyone, Please review and vote on the release candidate #0 for the version 4.8.2, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * Release notes [1] * The off

Re: [VOTE] Release 4.9.0, release candidate #1

2019-01-30 Thread Ivan Kelly
+1 (binding) (Ubuntu 16.06) - Sigs: GOOD - Licenses: GOOD - Binary packages boot?: Tested starting bookie with -server & -all packages. running bkctl simpletest against them. GOOD - checkstyle: GOOD - spotbugs: GOOD - rat: GOOD - test: GOOD - integration test: GOOD - Tested against pulsar master (

Re: EnsemblePlacementPolicy exposes third party API "Pair" from commons-lang3 in a public API

2019-01-23 Thread Ivan Kelly
> > ``` > > class PlacementResult { > > T result(); > > boolean strictlyConformsToPolicy(); > > } > > That was my first proposal and I like it. It is clearer and auto-documenting. > > Given that we are changing EnsemblePlacementPolicy at every major > version, we can defer this refactor to

Re: EnsemblePlacementPolicy exposes third party API "Pair" from commons-lang3 in a public API

2019-01-23 Thread Ivan Kelly
There's no harm in having our own tuple implementation in common, but in this instance we should encode more meaning into the returned value. As it is, it's not even java documented. But in both cases, it looks like the boolean is whether the placement strictly conforms to the placement policy, so

Re: Clusterwide vs Client configuration for metadata format version

2018-12-19 Thread Ivan Kelly
> If it is client level configuration, in theory it is possible to have latest > client create v3 ledger while bookies are still running in the older version > right? Yes, autorecovery would likely just break in this case. > If we go with cluster level, I think using it part of LAYOUT_ZNODE is

Re: Clusterwide vs Client configuration for metadata format version

2018-12-18 Thread Ivan Kelly
JV, Sam, Charan, Andrey, could one of you chime in on this? It's holding up 4.9 release. -Ivan On Thu, Dec 13, 2018 at 5:38 PM Ivan Kelly wrote: > > I'd be interested to see the opinion of the salesforce folks on this. > On Thu, Dec 13, 2018 at 5:35 PM Ivan Kelly wrote: >

Re: Clusterwide vs Client configuration for metadata format version

2018-12-13 Thread Ivan Kelly
I'd be interested to see the opinion of the salesforce folks on this. On Thu, Dec 13, 2018 at 5:35 PM Ivan Kelly wrote: > > > I am not sure about this. If clients don't react the changes of ledger > > layout, > > the information in ledger layout is just

Re: Clusterwide vs Client configuration for metadata format version

2018-12-13 Thread Ivan Kelly
> I am not sure about this. If clients don't react the changes of ledger > layout, > the information in ledger layout is just informative, you still need to > coordinate > both readers and writers. so IMO the version in ledger layout is not really > useful. The clients react the next time they ini

Re: Clusterwide vs Client configuration for metadata format version

2018-12-13 Thread Ivan Kelly
> I don't fully understand how the cluster-wide version work here, specially > how do clients react when people use the tool to bump the version in ledger > layout. Clients don't have to react immediately. The cluster-wide setting is the max _allowable_ format version. When it gets bumped, for exa

Clusterwide vs Client configuration for metadata format version

2018-12-12 Thread Ivan Kelly
Hi folks, A discussion has arisen about on [1] about the ledger layout changes I've made recently [2]. The change[2] adds a field maxLedgerMetadataFormat to the cluster-wide managed ledger layout. When a new ledger is created, this is the maximum format version which will be used to write it. Cur

Re: [VOTE] Release 4.7.3, release candidate #0

2018-12-04 Thread Ivan Kelly
+1 (binding) * LICENSE & NOTICE look good. * Rat good * spotbugs good * checkstyle had some issues, finding configs, but I'm not worried about it * sha512 and gpg good * tests ran cleanly * ran pulsar master integration tests against it. all passed Good work Sijie! -Ivan On Mon, Dec 3, 2018 at 8:

Re: Bug with blockAddCompletions decrement

2018-12-04 Thread Ivan Kelly
> > Not an issue with master as blockAddCompletions has been replaced with > > a simple boolean and failure handling changed to only deal with one > > failure at a time. However, looking at it again, I think I did spot a > > similar bug. will dig in after I send this. > > Great; looking forward. I

Re: Bug with blockAddCompletions decrement

2018-12-03 Thread Ivan Kelly
> This may not be an issue with the master as we moved to immutable metadata, Not an issue with master as blockAddCompletions has been replaced with a simple boolean and failure handling changed to only deal with one failure at a time. However, looking at it again, I think I did spot a similar bug

Re: Release 4.7.3

2018-11-29 Thread Ivan Kelly
What's the driving bugfix for cutting 4.7.3? On Wed, Nov 21, 2018 at 11:54 PM Sijie Guo wrote: > > Hi all, > > I would like to cut a 4.7.3 release late this week. If you have any fixes > to include in 4.7.3, please label those issues as 4.7.3, then I will make > sure they are included. > > - Sijie

Re: [VOTE] Release 4.8.1, release candidate #1

2018-11-21 Thread Ivan Kelly
+1(binding) Ubuntu 16.04.5 LTS - SHA512 & GPG good - rat, spotbugs, tests good - integration tests run cleanly - licenses good Good work Enrico! -Ivan On Wed, Nov 21, 2018 at 8:15 AM Jia Zhai wrote: > > +1(binding). > > MacOS 10.14.1 > > - SHA512 & GPG signatures good > - local build, mvn test

Re: Dropping 'stream' profile

2018-11-20 Thread Ivan Kelly
> Yes. That is a problem but it is not related stream profile. Let's separate > unrelated issues into a different thread. > Also I would suggest creating an issue in github when a problem is > considered a bug, so the discussion can be more organized. I'll create a bug. > > I would suggest we als

Re: Dropping 'stream' profile

2018-11-20 Thread Ivan Kelly
> There is no decision made. However I am -1 to drop stream profile, as I > have explained in may different threads that I have been mentioned. Where were these threads? I did a search in github and the list, but couldn't see anything. i didn't even try slack, search there is awful. > Here is the

Re: Dropping 'stream' profile

2018-11-20 Thread Ivan Kelly
It is possilble that the precommit stuff will need a follow up patch, to > > add a new precommit "subtask" > > > > Please check it out > > https://github.com/apache/bookkeeper/pull/1680 > > > > Enrico > > > > Il giorno lun 13 ago 2018 alle

Re: Missing Apache BookKeeper 4.8.0 artifacts from Maven Central

2018-11-12 Thread Ivan Kelly
Oops, had searched for 4.8.0 and it looked like it was fresh in my mailbox. my bad. On Fri, Nov 9, 2018 at 5:46 PM, Sijie Guo wrote: > That was an email from Oct 4 > > On Fri, Nov 9, 2018 at 5:01 AM Ivan Kelly wrote: >> >> Wow. Suprised we're only hearing about it no

Re: [VOTE] Release 4.8.1, release candidate #0

2018-11-09 Thread Ivan Kelly
-1 (binding) So in general the release looks fine, but there's still the error I flagged with 4.8.0. 2018-11-09 16:18:43,271 - ERROR - [main:DockerUtils@188] - DOCKER.exec(bookkeeper1_6e7e4ddf-717d-42bb-8701-4717d5027c92:/opt/bookkeeper/4.8.1/bin/bkctl ledger simpletest --ensemble-size 3 --write-

Re: Missing Apache BookKeeper 4.8.0 artifacts from Maven Central

2018-11-09 Thread Ivan Kelly
Wow. Suprised we're only hearing about it now -Ivan On Thu, Oct 4, 2018 at 1:14 PM, Enrico Olivelli wrote: > Now it is okay > https://search.maven.org/search?q=g:org.apache.bookkeeper > > Cheers > Enrico > Il giorno gio 4 ott 2018 alle ore 09:55 Enrico Olivelli > ha scritto: >> >> Hi, >> During

Re: [VOTE] Release 4.8.0, release candidate #1

2018-09-24 Thread Ivan Kelly
Thanks for putting this together Enrico. I had left the test running on friday and forgot to get back to it. +1 from me (binding). - Licenses good - SHA512 & GPG signatures good - Rat and spotbugs good - mvn test runs cleanly without -Dstream & -DstreamTests - integration tests run cleanly with -

Re: [VOTE] Release 4.8.0, release candidate #0

2018-09-18 Thread Ivan Kelly
It's as easy to do the fix as file an issue. Incoming. -Ivan On Tue, Sep 18, 2018 at 12:54 PM, Enrico Olivelli wrote: > Il giorno mar 18 set 2018 alle ore 11:41 Ivan Kelly ha > scritto: > >> Hey Enrico, >> >> Thanks for putting this together. Afraid it&#x

Re: [VOTE] Release 4.8.0, release candidate #0

2018-09-18 Thread Ivan Kelly
Hey Enrico, Thanks for putting this together. Afraid it's -1 from me though. The new binaries pull in grpc, which has a notice file which we are not bubbling up to our notice file. https://github.com/grpc/grpc-java/blob/v1.12.0/NOTICE.txt There's also some minor issues with the links in the LICE

Re: [VOTE] Apache BookKeeper Release 4.7.2, release candidate #0

2018-08-29 Thread Ivan Kelly
+1 (binding) * Sigs good, checksums good * Rat is good * Licenses pass the automated check * Tests pass * Integration smoke test passes -Ivan On Tue, Aug 28, 2018 at 8:57 PM, Matteo Merli wrote: > +1 (binding) > > * Checked signatures, LICENSES > * Checked Maven repository > * Started localb

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

2018-08-23 Thread Ivan Kelly
> In the case of OrderedExecutor, it needs a BlockingQueue and the current > default is to use JDK LinkedBlockingQueue which relies on CAS for > enqueue/dequeue. Additional room for improvement here is to use a more > specialized MP-SC queue with different wait strategies. +1 to this, though somet

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

2018-08-23 Thread Ivan Kelly
> I don't think it is accidently. OrderedExecutor has performance advantages > than OrderedScheduler. > > A bit background on this: > > - OrderedScheduler was introduced by me. And I changed existing > OrderedSafeExecutor to be extending from OrderedScheduler. > Trying to standarize to one `Order

Re: OrderedScheduler & OrderedExecutor in bookkeeper client

2018-08-23 Thread Ivan Kelly
>> We currently create an OrderedExecutor and an OrderedScheduler in the >> client. An OrderedScheduler is an OrderedExecutor. Moreover, it's very >> seldom used (basically for polling LAC, speculative reads and explicit >> flush. > > Why do they exist? Are they only legacy from past or is there an

OrderedScheduler & OrderedExecutor in bookkeeper client

2018-08-23 Thread Ivan Kelly
Hi folks, We currently create an OrderedExecutor and an OrderedScheduler in the client. An OrderedScheduler is an OrderedExecutor. Moreover, it's very seldom used (basically for polling LAC, speculative reads and explicit flush. I propose that we fold these into one. i.e. construct an OrderedSche

Re: [DISCUSS] BookKeeper 4.7.2 release

2018-08-16 Thread Ivan Kelly
> In a more concrete example, currently Pulsar 2.1.0 is using BK 4.7.1. There > have been multiple issues reported from Pulsar users using 4.7.1. > In order to address the problems facing by the users using BK 4.7.1, we > have to create this 4.7.2 release for fixing the bugs in 4.7.1. If it's need

Re: [DISCUSS] BookKeeper 4.7.2 release

2018-08-16 Thread Ivan Kelly
What changes are in 4.8.0 that was prevent users from moving directly to that? Is anyone requesting 4.7.2? -Ivan On Thu, Aug 16, 2018 at 8:59 AM, Sijie Guo wrote: > Hi all, > > There are bunch of fixes cherry-picked into branch-4.7 and some are marked > for cherry-picking to branch-4.7. I think

Re: Usefulness of ensemble change during recovery

2018-08-13 Thread Ivan Kelly
nce, it gives the illusion of data >> loss. Moreover, we have no way to determine the real data loss vs >> this scenario where we have never acknowledged the client. >> >> >> On Mon, Aug 6, 2018 at 12:32 AM, Sijie Guo wrote: >> >> > On Mon, Aug 6, 2018 at 1

Re: Dropping 'stream' profile

2018-08-13 Thread Ivan Kelly
+1 for dropping the profiles. On Mon, Aug 13, 2018 at 12:24 AM, Sijie Guo wrote: > I have no problem with this proposal. I am fine with dropping the profiles. > > Sijie > > On Sun, Aug 12, 2018 at 2:53 AM Enrico Olivelli wrote: > >> Hi, >> Currently in order to build the full code you have to ad

Status of immutable metadata

2018-08-09 Thread Ivan Kelly
Hi folks, As some of you are aware I've been working on making the client metadata immutable. What this means is that the client will only act on metadata that reflects what is in zookeeper. I've pretty much got all the code done for it. However, the last 2 days I've been sidetracked by a bug I s

Re: [Draft] ASF Board Report for BookKeeper (August)

2018-08-08 Thread Ivan Kelly
lgtm +1 On Wed, Aug 8, 2018 at 10:56 AM, Enrico Olivelli wrote: > Looks good, thank you Sijie for taking care of this. > > Enrico > > Il giorno mer 8 ago 2018 alle ore 10:48 Sijie Guo ha > scritto: > >> Hi all, >> >> We have a board report due today. Here is a draft. Please take a look and >> le

Re: Usefulness of ensemble change during recovery

2018-08-06 Thread Ivan Kelly
>> Recovery operates on a few seconds of data (from the last LAC written >> to the end of the ledger, call this LLAC). > > the data during this duration can be very large if the traffic of the > ledger is large. That has > been observed at Twitter's production. so when we are talking about "a few >

Usefulness of ensemble change during recovery

2018-08-04 Thread Ivan Kelly
Hi folks, Recently I've been working to make the ledger metadata on the client immutable, with the goal of making client metadata management more understandable. The basic idea is that the metadata the client uses should reflect what is in zookeeper. So if a client wants to modify the metadata, if

Re: Changing ledger metadata to binary format

2018-07-30 Thread Ivan Kelly
>> Thank you for putting this together. It is also good to put this as a BP, >> since it is about the metadata layout. I'll put a BP up this week once I have initial feedback. >> > - When writing a metadata, check what is in /ledgers/LAYOUT. If it is >> > as above, write using the current text pr

Changing ledger metadata to binary format

2018-07-27 Thread Ivan Kelly
Hi folks, I think this was discussed yesterday in the meeting, and a bit on slack, but I haven't seen anything much written down, so I'm starting a thread here. The crux of the problem is that the protobuf text format currently used for metadata cannot have new fields added without breaking clien

Re: Official Docker Images

2018-05-29 Thread Ivan Kelly
+1 from me. Changing the tag was really bad practice. Good to see this changing. On Tue, May 29, 2018 at 6:59 AM, Enrico Olivelli wrote: > Il mar 29 mag 2018, 02:50 Sijie Guo ha scritto: > >> Since I am going to cut a new release 4.7.1, I would like to change the >> release procedure for docker

Re: [ANNOUNCE] Apache BookKeeper 4.6.2 released

2018-04-10 Thread Ivan Kelly
Great work getting this out Enrico! Cheers, Ivan On Tue, Apr 10, 2018 at 9:28 AM, Enrico Olivelli wrote: > The Apache BookKeeper team is proud to announce Apache BookKeeper version > 4.6.2. > > Apache BookKeeper is a scalable, fault-tolerant, and low-latency storage > service optimized for > rea

Re: [VOTE] Apache BookKeeper Release 4.6.2, release candidate #2

2018-04-03 Thread Ivan Kelly
+1 lgtm RAT, FINDBUGS & TESTS: Runs cleanly. I had -Dsurefire.rerunFailingTestsCount=2 set, but I don't think anything even flaked. Minor issue: Vertx http tests expect 8080 to be free SHA1 & SIGs: Good. LICENSE & NOTICE: Nothing changed since 4.6.1, so looks good. Minor: Copyright in notic

Re: Old DistributedLog 0.5.0 RC1 on dist.apache.org

2018-04-03 Thread Ivan Kelly
Kill it. If 0.5.0 is out, then it serves no purpose. -Ivan On Tue, Apr 3, 2018 at 1:36 PM, Enrico Olivelli wrote: > Hi, > there is this old directory > https://dist.apache.org/repos/dist/dev/bookkeeper/distributedlog/0.5.0-rc1/ > > can I drop it ? > I think Jia left it during the release proces

Re: [VOTE] Apache BookKeeper Release 4.6.2, release candidate #1

2018-04-03 Thread Ivan Kelly
On Tue, Apr 3, 2018 at 12:48 PM, Enrico Olivelli wrote: > We found an issue and this is the fix > https://github.com/apache/bookkeeper/pull/1312 > > Will send a new RC as soon as the patch is merged to branch-4.6 Taking a look. -Ivan

Re: [VOTE] Apache BookKeeper Release 4.6.2, release candidate #1

2018-04-03 Thread Ivan Kelly
> I cannot check on the 4.6.0 one because we are only keeping the latest > version on dist.apache.org archive.apache.org has older releases. -Ivan

Re: [VOTE] Apache BookKeeper Release 4.6.2, release candidate #1

2018-04-03 Thread Ivan Kelly
Hey Enrico, Thanks for putting the release together. I'm afraid there's an issue with the source package though. ~/blah/4.6.2-rc1 $ ls -l bookkeeper-4.6.2/bookkeeper-server/bin total 20 -rw-r--r-- 1 ivan ivan 7364 Mar 28 09:24 bookkeeper -rw-r--r-- 1 ivan ivan 2869 Mar 28 09:24 bookkeeper-cluster

Re: BK metrics

2018-03-20 Thread Ivan Kelly
> @Ivan, for some reasons I did not receive your reply but found it in the > email archives. Are you subscribed to the list? I did see one mail from you show up in moderation. > At 80K request/sec throttling for record size of 1K, I am getting below > throughput. The 99th percentile of `bookkee

Re: BK metrics

2018-03-20 Thread Ivan Kelly
> 2) If it's in milliseconds, are these numbers in expected range (see > attached image). To me 2.5 seconds (2.5K ms) latency for add entry request > is very high. 2.5 seconds is very high, but your write rate is also high. 100,000 * 1KB is 100MB/s. SSD should be able to take it from the journal s

Re: [DISCUSS] Inconsistency in Handle based APIs - Specifically "close"

2018-03-20 Thread Ivan Kelly
On Mon, Mar 19, 2018 at 6:37 PM, Sijie Guo wrote: > It is not a blocker for me. > > But if we want consistency, either applying pattern "asyncXYZ()" or > "xyzAsync()" for async operations works for me. xyzAsync is better than asyncXyz, as it will put the async and sync versions together in the jav

Re: [DISCUSS] Inconsistency in Handle based APIs - Specifically "close"

2018-03-19 Thread Ivan Kelly
> Is implementing Closable a "valueable" feature for us in the new API ? (I > think the answer is 'yes') I'm not so sure how useful Closeable is here. It is handy in tests, but in production code you are never going to use the try-with-resources pattern, as you'll be using async calls for everyth

[DISCUSS] Inconsistency in Handle based APIs - Specifically "close"

2018-03-19 Thread Ivan Kelly
Hi folks, I'm currently changing some parts of pulsar to use the new APIs and the inconsistency in the close api has raised its head again, so I'm restarting this discussion. Handle has the following methods: async: asyncClose sync: close, getId, getLedgerMetadata ReadHandle has the following me

Reporting CI failures to github issues

2018-03-19 Thread Ivan Kelly
Hi folks, When you report a CI failure to github, can you click on "Keep this build forever" in jenkins itself, so that the results will still be around when we eventually look at it. Unfortunately, you need to log in for this. Cheers, Ivan

Re: Drop md5 from release procedure

2018-03-19 Thread Ivan Kelly
> Can you forward the discussion thread if there is one? It was a mail to the private list, subject "checksum file Release Distribution Policy" -Ivan

Re: Help with bad errors on 4.6.1

2018-03-16 Thread Ivan Kelly
> With "paranoid" log in Netty I found this that is very interesting, but it > happens even on Java 8. I don't think leaks are the problem here though. This seems to be more like a doublefree issue. -Ivan

Re: Help with bad errors on 4.6.1

2018-03-15 Thread Ivan Kelly
> What is the difference in Channel#write/ByteBuf pooling.in Java 9 ? Sounds like it could be an issue in netty itself. Java 9 removed a bunch of stuff around Unsafe, which I'm pretty sure netty was using for ByteBuf. Have you tried setting the pool debugging to paranoid? -Dio.netty.leakDetect

Re: Help with bad errors on 4.6.1

2018-03-14 Thread Ivan Kelly
>> > @Ivan >> > I wonder if some tests on Jepsen with bookie restarts may find this kind >> of >> > issues, given that it is not a network/SO problem >> If jepsen can catch then normal integration test can. I attempted a repro for this using the integration test stuff. Running for 2-3 hours in a l

Re: Jenkins broken this morning

2018-03-14 Thread Ivan Kelly
Broken again, infra are working on it. On Wed, Mar 14, 2018 at 1:10 PM, Ivan Kelly wrote: > Hi folks, > > Jenkins went nuts this morning, so infra rebooted it. It means that > some jobs that may have been pending never happened, so rekick the > testing on your patches if need

Jenkins broken this morning

2018-03-14 Thread Ivan Kelly
Hi folks, Jenkins went nuts this morning, so infra rebooted it. It means that some jobs that may have been pending never happened, so rekick the testing on your patches if needed (retest this please). Cheers Ivan

Re: Help with bad errors on 4.6.1

2018-03-13 Thread Ivan Kelly
> @Ivan > I wonder if some tests on Jepsen with bookie restarts may find this kind of > issues, given that it is not a network/SO problem If jepsen can catch then normal integration test can. The readers in question, are they tailing with long poll, or just calling readLastAddConfirmed in a loop? W

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Ivan Kelly
> It is interesting that the problems is on 'readers' and it seems that the > PCBC seems corrupted and even writes (if the broker is promoted to > 'leader') are able to go on after the reads broke the client. Are writes coming from the same clients? Or clients in the same process? -Ivan

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Ivan Kelly
> - when I "restart" bookies I issue a kill -9 (I think this could be the > reason why I can't reproduce the issue on testcases) With a clean shutdown of bookies we close the channels, and it should do the tcp shutdown handshake. -9 will kill the process before it gets to do any of that, but the ke

Re: Help with bad errors on 4.6.1

2018-03-12 Thread Ivan Kelly
gt; > > this makes the bookie switch in a corrupted state by double releasing a >> > > bytebuf? >> > > >> > >> > I did some experiments and it is easy to reproduce the bookie side error >> > and the double release with a forged sequence of byt

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Ivan Kelly
On Fri, Mar 9, 2018 at 3:20 PM, Enrico Olivelli wrote: > Bookies > 10.168.10.117:1822 -> bad bookie with 4.1.21 > 10.168.10.116:1822 -> bookie with 4.1.12 > 10.168.10.118:1281 -> bookie with 4.1.12 > > 10.168.10.117 client machine on which I have 4.1.21 client (different > process than the bookie

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Ivan Kelly
Also, do you have the logs of the error occurring on the server side? -Ivan On Fri, Mar 9, 2018 at 3:16 PM, Ivan Kelly wrote: > On Fri, Mar 9, 2018 at 3:13 PM, Enrico Olivelli wrote: >> New dump, >> sequence (simpler) >> >> 1) system is running, reader is readin

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Ivan Kelly
On Fri, Mar 9, 2018 at 3:13 PM, Enrico Olivelli wrote: > New dump, > sequence (simpler) > > 1) system is running, reader is reading without errors with netty 4.1.21 > 2) 3 bookies, one is with 4.1.21 and the other ones with 4.1.12 > 3) kill one bookie with 4.1.12, the reader starts reading from th

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Ivan Kelly
I've asked enrico to run again, as this dump doesn't span the time when the issue started occurring. What I'm looking for is to be able to inspect the first packet which triggers the version downgrade of the decoders. On Fri, Mar 9, 2018 at 3:04 PM, Enrico Olivelli wrote: > This is the dump > >

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Ivan Kelly
> Any suggestion on the tcpdump config ? (command line example) sudo tcpdump -s 200 -w blah.pcap 'tcp port 3181' Where are you going to change the netty? client or server or both? -Ivan

Re: Help with bad errors on 4.6.1

2018-03-09 Thread Ivan Kelly
Great analysis Sijie. Enrico, are these high traffic machines? Would it be feasible to put tcpdump running? You could even truncate each message to 100 bytes or so, to avoid storing payloads. It'd be very useful to see what the corrupt traffic actually looks like. -Ivan On Fri, Mar 9, 2018 at 10

  1   2   3   4   >