I guess it would make the api less clean, but I can imagine a sendBatch method, which returns a single Future that gets triggered only when all messages in the batch were finished. The callback info could then contain info about the success/exceptions encountered by each sub-group of messages. And the callback could even be called multiple times, once for each sub-batch sent. It gets complicated to think about it, but it would be fewer Future objects created and less async contention/waiting, etc.
I'll try it out and see.... Jason On Thu, Nov 20, 2014 at 7:56 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Internally it works as you describe, there is only one CountDownLatch per > batch sent, each of the futures is just a wrapper around that. > > It is true that if you accumulate thousands of futures in a list that may > be a fair number of objects you are retaining, and there will be some work > involved in checking them all. If you are sure they are all going to the > same partition you can actually wait on the last future since sends are > ordered within a partition. So when the final send completes the prior > sends should also have completed. > > Either way if you see a case where the new producer isn't as fast as the > old producer let us know. > > -Jay > > > > On Thu, Nov 20, 2014 at 4:24 PM, Jason Rosenberg <j...@squareup.com> wrote: > > > I've been looking at the new producer api with anticipation, but have not > > fired it up yet. > > > > One question I have, is it looks like there's no longer a 'batch' send > mode > > (and I get that this is all now handled internally, e.g. you send > > individual messages, that then get collated and batched up and sent out). > > > > What I'm wondering, is whether there's added overhead in the producer > (and > > the client code) having to manage all the Future return Objects from all > > the individual messages sent? If I'm sending 100K messages/second, etc., > > that seems like a lot of async Future Objects that have to be tickled, > and > > waited for, etc. Does not this cause some overhead? > > > > If I send a bunch of messages and then store all the Future's in a list, > > and then wait for all of them, it seems like a lot of thread contention. > > On the other hand, if I send a batch of messages, that are likely all to > > get sent as a single batch over the wire (cuz they are all going to the > > same partition), wouldn't there be some benefit in only having to wait > for > > a single Future Object for the batch? > > > > Jason > > >