@Shapira You are correct from my perspective. We are using kafka for a system where panels can send multiple events in a single message. The current contract is such that all events fail or succeed as a whole. If there is a failure the panel resends all the events. The existing producer api supports this fine, am I getting left behind here for the sake of brevity?
I can get behind not adding every feature people ask for but taking away something is a different story all together. On Wed, Apr 29, 2015 at 9:08 PM, Gwen Shapira <gshap...@cloudera.com> wrote: > I'm starting to think that the old adage "If two people say you are drunk, > lie down" applies here :) > > Current API seems perfectly clear, useful and logical to everyone who wrote > it... but we are getting multiple users asking for the old batch behavior > back. > One reason to get it back is to make upgrades easier - people won't need to > rethink their existing logic if they get an API with the same behavior in > the new producer. The other reason is what Ewen mentioned earlier - if > everyone re-implements Joel's logic, we can provide something for that. > > How about getting the old batch send behavior back by adding a new API > with: > public void batchSend(List<ProducerRecord<K,V>>) > > With this implementation (mixes the old behavior with Joel's snippet): > * send records one by one > * flush > * iterate on futures and "get" them > * log a detailed message on each error > * throw an exception if any send failed. > > It reproduces the old behavior - which apparently everyone really liked, > and I don't think it is overly weird. It is very limited, but anyone who > needs more control over his sends already have plenty of options. > > Thoughts? > > Gwen > > > > > On Tue, Apr 28, 2015 at 5:29 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Hey guys, > > > > The locking argument is correct for very small records (< 50 bytes), > > batching will help here because for small records locking becomes the big > > bottleneck. I think these use cases are rare but not unreasonable. > > > > Overall I'd emphasize that the new producer is way faster at virtually > all > > use cases. If there is a use case where that isn't true, let's look at it > > in a data driven way by comparing the old producer to the new producer > and > > looking for any areas where things got worse. > > > > I suspect the "reducing allocations" argument to be not a big thing. We > do > > a number of small per-message allocations and it didn't seem to have much > > impact. I do think there are a couple of big producer memory > optimizations > > we could do by reusing the arrays in the accumulator in the serialization > > of the request but I don't think this is one of them. > > > > I'd be skeptical of any api that was too weird--i.e. introduces a new way > > of partitioning, gives back errors on a per-partition rather than per > > message basis (given that partitioning is transparent this is really hard > > to think about), etc. Bad apis end up causing a ton of churn and just > don't > > end up being a good long term commitment as we change how the underlying > > code works over time (i.e. we hyper optimize for something then have to > > maintain some super weird api as it becomes hyper unoptimized for the > > client over time). > > > > Roshan--Flush works as you would hope, it blocks on the completion of all > > outstanding requests. Calling get on the future for the request gives you > > the associated error code back. Flush doesn't throw any exceptions > because > > waiting for requests to complete doesn't error, the individual requests > > fail or succeed which is always reported with each request. > > > > Ivan--The batches you send in the scala producer today actually aren't > > truely atomic, they just get sent in a single request. > > > > One tricky problem to solve when user's do batching is size limits on > > requests. This can be very hard to manage since predicting the serialized > > size of a bunch of java objects is not always obvious. This was > repeatedly > > a problem before. > > > > -Jay > > > > On Tue, Apr 28, 2015 at 4:51 PM, Ivan Balashov <ibalas...@gmail.com> > > wrote: > > > > > I must agree with @Roshan – it's hard to imagine anything more > intuitive > > > and easy to use for atomic batching as old sync batch api. Also, it's > > fast. > > > Coupled with a separate instance of producer per > > > broker:port:topic:partition it works very well. I would be glad if it > > finds > > > its way into new producer api. > > > > > > On a side-side-side note, could anyone confirm/deny if SimpleConsumer's > > > fetchSize must be set at least as batch bytes (before or after > > > compression), otherwise client risks not getting any messages? > > > > > >