Re: New Producer Public API

2014-02-06 Thread S Ahmed
How about the following use case: Just before the producer actually sends the payload to kakfa, could an event be exposed that would allow one to loop through the messages and potentially delete some of them? Example: Say you have 100 messages, but before you send these messages to kakfa, you

Re: New Producer Public API

2014-02-05 Thread Neha Narkhede
Currently, the user will send ProducerRecords using the new producer. The expectation will be that you get the same thing as output from the consumer. Since ProduceRecord is a holder for topic, partition, key and value, does it make sense to rename it to just Record? So, the send/receive APIs

Re: New Producer Public API

2014-02-05 Thread Jay Kreps
It might. I considered this but ended up going this way. Now that we have changed partitionKey=partition it almost works. The difference is the consumer gets an offset too which the producer doesn't have. One thing I think this points to is the value of getting the consumer java api worked out

Re: New Producer Public API

2014-01-31 Thread David Arthur
On 1/24/14 7:41 PM, Jay Kreps wrote: Yeah I'll fix that name. Hmm, yeah, I agree that often you want to be able delay network connectivity until you have started everything up. But at the same time I kind of loath special init() methods because you always forget to call them and get one round

Re: New Producer Public API

2014-01-31 Thread Oliver Dain
Hey all, I¹m excited about having a new Producer API, and I really like the idea of removing the distinction between a synchronous and asynchronous producer. The one comment I have about the current API is that it¹s hard to write truly asynchronous code with the type of future returned by the

Re: New Producer Public API

2014-01-31 Thread Oliver Dain
I wanted to suggest an alternative to the serialization issue. As I understand it, the concern is that if the user is responsible for serialization it becomes difficult for them to compute the partition as the plugin that computes the partition would be called with byte[] forcing the user to

Re: New Producer Public API

2014-01-31 Thread Tom Brown
The trouble with callbacks, IMHO, is determining the thread in which they will be executed. Since the IO thread is usually the thread that knows when the operation is complete, it's easiest to execute that callback within the IO thread. This can lead the IO thread to spend all its time on

Re: New Producer Public API

2014-01-31 Thread Jay Kreps
Oliver, Yeah that was my original plan--allow the registration of multiple callbacks on the future. But there is some additional implementation complexity because then you need more synchronization variables to ensure the callback gets executed even if the request has completed at the time the

Re: New Producer Public API

2014-01-31 Thread Tom Brown
Regarding partitioning APIs, I don't think there is not a common subset of information that is required for all strategies. Instead of modifying the core API to easily support all of the various partitioning strategies, offer the most common ones as libraries they can build into their own data

Re: New Producer Public API

2014-01-31 Thread Jay Kreps
Hey Tom, Agreed, there is definitely nothing that prevents our including partitioner implementations, but it does get a little less seamless. -Jay On Fri, Jan 31, 2014 at 2:35 PM, Tom Brown tombrow...@gmail.com wrote: Regarding partitioning APIs, I don't think there is not a common subset of

Re: New Producer Public API

2014-01-30 Thread Joel Koshy
Does it preclude those various implementations? i.e., it could become a producer config: default.partitioner.strategy=minimize-connections/roundrobin - and so on; and implement those partitioners internally in the producer. Not as clear as a .class config, but it accomplishes the same effect no?

Re: New Producer Public API

2014-01-30 Thread Joel Koshy
+ dev (this thread has become a bit unwieldy) On Thu, Jan 30, 2014 at 5:15 PM, Joel Koshy jjkosh...@gmail.com wrote: Does it preclude those various implementations? i.e., it could become a producer config: default.partitioner.strategy=minimize-connections/roundrobin - and so on; and

Re: New Producer Public API

2014-01-30 Thread Jay Kreps
I thought a bit about it and I think the getCluster() thing was overly simplistic because we try to only maintain metadata about the current set of topics the producer cares about so the cluster might not have the partitions for the topic the user cares about. I think actually what we need is a

Re: New Producer Public API

2014-01-29 Thread Tom Brown
Jay, There is both a basic client object, and a number of IO processing threads. The client object manages connections, creating new ones when new machines are connected, or when existing connections die. It also manages a queue of requests for each server. The IO processing thread has a

Re: New Producer Public API

2014-01-29 Thread Jay Kreps
Hey Tom, So is there one connection and I/O thread per broker and a low-level client for each of those, and then you hash into that to partition? Is it possible to batch across partitions or only within a partition? -Jay On Wed, Jan 29, 2014 at 8:41 AM, Tom Brown tombrow...@gmail.com wrote:

Re: New Producer Public API

2014-01-29 Thread Jay Kreps
Hey Neha, Error handling in RecordSend works as in Future you will get the exception if there is one from any of the accessor methods or await(). The purpose of hasError was that you can write things slightly more simply (which some people expressed preference for): if(send.hasError()) //

Re: New Producer Public API

2014-01-29 Thread Steve Morin
Is the new producer API going to maintain protocol compatibility with old version if the API under the hood? On Jan 29, 2014, at 10:15, Jay Kreps jay.kr...@gmail.com wrote: The challenge of directly exposing ProduceRequestResult is that the offset provided is just the base offset and there

Re: New Producer Public API

2014-01-29 Thread Neha Narkhede
The challenge of directly exposing ProduceRequestResult is that the offset provided is just the base offset and there is no way to know for a particular message where it was in relation to that base offset because the batching is transparent and non-deterministic. That's a good point. I need to

Re: New Producer Public API

2014-01-29 Thread Chris Riccomini
Hey Guys, My 2c. 1. RecordSend is a confusing name to me. Shouldn't it be RecordSendResponse? 2. Random nit: it's annoying to have the Javadoc info for the contstants on http://empathybox.com/kafka-javadoc/kafka/clients/producer/ProducerConfig.h tml, but the string constant values on

Re: New Producer Public API

2014-01-28 Thread Ross Black
Hi Jay, - Just to add some more info/confusion about possibly using Future ... If Kafka uses a JDK future, it plays nicely with other frameworks as well. Google Guava has a ListenableFuture that allows callback handling to be added via the returned future, and allows the callbacks to be

Re: New Producer Public API

2014-01-28 Thread Mattijs Ugen (DT)
Sorry to tune in a bit late, but here goes. 1. The producer since 0.8 is actually zookeeper free, so this is not new to this client it is true for the current client as well. Our experience was that direct zookeeper connections from zillions of producers wasn't a good idea for a number of

Re: New Producer Public API

2014-01-28 Thread Scott Clasen
+1 to zk bootstrap + close as an option at least On Tue, Jan 28, 2014 at 10:09 AM, Neha Narkhede neha.narkh...@gmail.comwrote: The producer since 0.8 is actually zookeeper free, so this is not new to this client it is true for the current client as well. Our experience was that direct

Re: New Producer Public API

2014-01-28 Thread Roger Hoover
+1 ListenableFuture: If this works similar to Deferreds in Twisted Python or Promised IO in Javascript, I think this is a great pattern for decoupling your callback logic from the place where the Future is generated. You can register as many callbacks as you like, each in the appropriate layer of

Re: New Producer Public API

2014-01-28 Thread Jay Kreps
Hmmm, I would really strongly urge us to not introduce a zk dependency just for discovery. People who want to implement this can certainly do so by simply looking up urls and setting them in the consumer config, but our experience with doing this at large scale was pretty bad. Hardcoding the

Re: New Producer Public API

2014-01-28 Thread Tom Brown
I implemented a 0.7 client in pure java, and its API very closely resembled this. (When multiple people independently engineer the same solution, it's probably good... right?). However, there were a few architectural differences with my client: 1. The basic client itself was just an asynchronous

Re: New Producer Public API

2014-01-28 Thread Jay Kreps
Hey Roger, We really can't use ListenableFuture directly though I agree it is nice. We have had some previous experience with embedding google collection classes in public apis, and it was quite the disaster. The problem has been that the google guys regularly go on a refactoring binge for no

Re: New Producer Public API

2014-01-28 Thread Jay Kreps
Hey Tom, That sounds cool. How did you end up handling parallel I/O if you wrap the individual connections? Don't you need some selector that selects over all the connections? -Jay On Tue, Jan 28, 2014 at 2:31 PM, Tom Brown tombrow...@gmail.com wrote: I implemented a 0.7 client in pure java,

Re: New Producer Public API

2014-01-28 Thread Neha Narkhede
Here are more thoughts on the public APIs - - I suggest we use java's Future instead of custom Future especially since it is part of the public API - Serialization: I like the simplicity of the producer APIs with the absence of serialization where we just deal with byte arrays for keys and

Re: New Producer Public API

2014-01-28 Thread Jay Kreps
Hey Neha, Can you elaborate on why you prefer using Java's Future? The downside in my mind is the use of the checked InterruptedException and ExecutionException. ExecutionException is arguable, but forcing you to catch InterruptedException, often in code that can't be interrupted, seems perverse.

Re: New Producer Public API

2014-01-27 Thread Clark Breyman
Jay - Config - your explanation makes sense. I'm just so accustomed to having Jackson automatically map my configuration objects to POJOs that I've stopped using property files. They are lingua franca. The only thought might be to separate the config interface from the implementation to allow for

Re: New Producer Public API

2014-01-27 Thread Jay Kreps
Clark, Yeah good point. Okay I'm sold on Closable. Autoclosable would be much better, but for now we are retaining 1.6 compatibility and I suspect the use case of temporarily creating a producer would actually be a more rare case. -Jay On Mon, Jan 27, 2014 at 9:29 AM, Clark Breyman

Re: New Producer Public API

2014-01-27 Thread Clark Breyman
re: Using package to avoid ambiguity - Unlike Scala, this is really cumbersome in Java as it doesn't support package imports or import aliases, so the only way to distinguish is to use the fully qualified path. re: Closable - it can throw IOException but is not required to. Same with

Re: New Producer Public API

2014-01-27 Thread Xavier Stevens
AutoCloseable would be nice for us as most of our code is using Java 7 at this point. I like Dropwizard's configuration mapping to POJOs via Jackson, but if you wanted to stick with property maps I don't care enough to object. If the producer only dealt with bytes, is there a way we could still

Re: New Producer Public API

2014-01-26 Thread Clark Breyman
Thanks Jay. I'll see if I can put together a more complete response, perhaps as separate threads so that topics don't get entangled. In the mean time, here's a couple responses: Serialization: you've broken out a sub-thread so i'll reply there. My bias is that I like generics (except for

Re: New Producer Public API

2014-01-26 Thread Jay Kreps
Thanks for the detailed thoughts. Let me elaborate on the config thing. I agree that at first glance key-value strings don't seem like a very good configuration api for a client. Surely a well-typed config class would be better! I actually disagree and let me see if I can convince you. My

Re: New Producer Public API

2014-01-26 Thread Jay Kreps
Clark, With respect to maven it would be great to know if you see any issues with the gradle stuff. For serialization I would love to hear if any of the options I outlined seemed good to you or if you have another idea. For futures, that would be awesome to see how it would help. I agree that

Re: New Producer Public API

2014-01-26 Thread Jay Kreps
Just to keep all the points in a single thread, here are a few other points brought up by others: 1. Sriram/Guozhang: Should RecordSend.await() throw an exception if one occurred during the call. The argument for this is that it is similar to Future and an exception is the least surprising way to

New Producer Public API

2014-01-24 Thread Jay Kreps
As mentioned in a previous email we are working on a re-implementation of the producer. I would like to use this email thread to discuss the details of the public API and the configuration. I would love for us to be incredibly picky about this public api now so it is as good as possible and we

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
If I understand your use case I think usage would be something like producer.send(message, new Callback() { public void onCompletion(RecordSend send) { if(send.hasError()) log.write(message); } }); Reasonable? In other words you can include references to any

Re: New Producer Public API

2014-01-24 Thread Andrey Yegorov
So for each message that I need to send asynchronously I have to create a new instance of callback and hold on to the message? This looks nice in theory but in case of few thousands of request/sec this could use up too much extra memory and push too much to garbage collector, especially in case

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Andrey, I think this should perform okay. We already create a number of objects per message sent, one more shouldn't have too much performance impact if it is just thousands per second. -Jay On Fri, Jan 24, 2014 at 2:28 PM, Andrey Yegorov andrey.yego...@gmail.comwrote: So for each message

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Roger, These are good questions. 1. The producer since 0.8 is actually zookeeper free, so this is not new to this client it is true for the current client as well. Our experience was that direct zookeeper connections from zillions of producers wasn't a good idea for a number of reasons. Our

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Hey Clark, - Serialization: Yes I agree with these though I don't consider the loss of generics a big issue. I'll try to summarize what I would consider the best alternative api with raw byte[]. - Maven: We had this debate a few months back and the consensus was gradle. Is there a specific issue

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Yeah I'll fix that name. Hmm, yeah, I agree that often you want to be able delay network connectivity until you have started everything up. But at the same time I kind of loath special init() methods because you always forget to call them and get one round of error every time. I wonder if in

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Hey Joe, Metadata: Yes, this is how it works. You give a URL or a few URLs to bootstrap from. From then on any metadata change will percolate up to all producers so you should be able to dynamically change the cluster in any way without needing to restart or reconfigure the producers. So I think

Re: New Producer Public API

2014-01-24 Thread Jay Kreps
Clark and all, I thought a little bit about the serialization question. Here are the options I see and the pros and cons I can think of. I'd love to hear people's preferences if you have a strong one. One important consideration is that however the producer works will also need to be how the new