Re: Random Partitioning Issue

2013-10-01 Thread Jun Rao
Yes, let's describe that behavior in FAQ. Thanks, Jun On Tue, Oct 1, 2013 at 8:35 AM, Joe Stein wrote: > agreed, lets hold off until after 0.8 > > I will update the JIRA ticket I created with your feedback and options we > can discuss it there and then deal with changes in 0.8.1 or 0.9 or suc

Re: Random Partitioning Issue

2013-10-01 Thread Joe Stein
agreed, lets hold off until after 0.8 I will update the JIRA ticket I created with your feedback and options we can discuss it there and then deal with changes in 0.8.1 or 0.9 or such. I will update the FAQ (should have time tomorrow unless someone else gets to it first) I think we should have it

Re: Random Partitioning Issue

2013-10-01 Thread Jun Rao
This proposal still doesn't address the following fundamental issue: The random partitioner cannot select a random and AVAILABLE partition. So, we have the following two choices. 1. Stick with the current partitioner api. Then, we have to pick one way to do random partitioning (when key is null).

Re: Random Partitioning Issue

2013-09-30 Thread Joe Stein
How about making UUID.randomUUID.toString() the default in KeyedMessage instead of null if not supplied def this(topic: String, message: V) = this(topic, UUID.randomUUID.toString(), message) and if you want the random refresh behavior then pass in "*" on the KeyedMessage construction which we can

Re: Random Partitioning Issue

2013-09-29 Thread Jun Rao
The main issue is that if we do that, when key is null, we can only select a random partition, but not a random and available partition, without changing the partitioner api. Being able to do the latter is important in my opinion. For example, a user may choose the replication factor of a topic to

Re: Random Partitioning Issue

2013-09-28 Thread Guozhang Wang
I think Joe's suggesting that we can remove the checking logic for key==null in DefaultEventHandler, and do that in partitioner. One thing about this idea is any customized partitioner also has to consider key == null case then. Guozhang On Fri, Sep 27, 2013 at 9:12 PM, Jun Rao wrote: > We ha

Re: Random Partitioning Issue

2013-09-27 Thread Jun Rao
We have the following code in DefaultEventHandler: val partition = if(key == null) { // If the key is null, we don't really need a partitioner // So we look up in the send partition cache for the topic to decide the target partition val id = sendPartitionPerTopicC

Re: Random Partitioning Issue

2013-09-27 Thread Joe Stein
hmmm, yeah, on I don't want todo that ... if we don't have to. What if the DefaultPartitioner code looked like this instead =8^) private class DefaultPartitioner[T](props: VerifiableProperties = null) extends Partitioner[T] { def partition(key: T, numPartitions: Int): Int = { if (key == nu

Re: Random Partitioning Issue

2013-09-27 Thread Jun Rao
However, currently, if key is null, the partitioner is not even called. Do you want to change DefaultEventHandler too? This also doesn't allow the partitioner to select a random and available partition, which in my opinion is more important than making partitions perfectly evenly balanced. Thanks

Re: Random Partitioning Issue

2013-09-27 Thread Joe Stein
What I was proposing was two fold 1) revert the DefaultPartitioner class then 2) create a new partitioner that folks could use (like at LinkedIn you would use this partitioner instead) in ProducerConfig private class RandomRefreshTimPartitioner[T](props: VerifiableProperties = null) extends Par

Re: Random Partitioning Issue

2013-09-27 Thread Jun Rao
Joe, Not sure I fully understand your propose. Do you want to put the random partitioning selection logic (for messages without a key) in the partitioner without changing the partitioner api? That's difficult. The issue is that in the current partitioner api, we don't know which partitions are ava

Re: Random Partitioning Issue

2013-09-27 Thread Joe Stein
Jun, can we hold this extra change over for 0.8.1 and just go with reverting where we were before for the default with a new partition for meta refresh and support both? I am not sure I entirely understand why someone would need the extra functionality you are talking about which sounds cool thoug

Re: Random Partitioning Issue

2013-09-22 Thread Jun Rao
It's reasonable to make the behavior of random producers customizable through a pluggable partitioner. So, if one doesn't care about # of socket connections, one can choose to select a random partition on every send. If one does have many producers, one can choose to periodically select a random pa

Re: Random Partitioning Issue

2013-09-18 Thread Joe Stein
Sounds good, I will create a JIRA and upload a patch. /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop / On Sep 17, 2013, at 1:

Re: Random Partitioning Issue

2013-09-17 Thread Jay Kreps
I would be in favor of that. I agree this is better than 0.7. -Jay On Tue, Sep 17, 2013 at 10:19 AM, Joel Koshy wrote: > I agree that minimizing the number of producer connections (while > being a good thing) is really required in very large production > deployments, and the net-effect of the

Re: Random Partitioning Issue

2013-09-17 Thread Joel Koshy
I agree that minimizing the number of producer connections (while being a good thing) is really required in very large production deployments, and the net-effect of the existing change is counter-intuitive to users who expect an immediate even distribution across _all_ partitions of the topic. How

Re: Random Partitioning Issue

2013-09-15 Thread Jay Kreps
Let me ask another question which I think is more objective. Let's say 100 random, smart infrastructure specialists try Kafka, of these 100 how many do you believe will 1. Say that this behavior is what they expected to happen? 2. Be happy with this behavior? I am not being facetious I am genuinely

Re: Random Partitioning Issue

2013-09-15 Thread Jay Kreps
I just took a look at this change. I agree with Joe, not to put to fine a point on it, but this is a confusing hack. Jun, I don't think wanting to minimizing the number of TCP connections is going to be a very common need for people with less than 10k producers. I also don't think people are going

Re: Random Partitioning Issue

2013-09-14 Thread Jun Rao
Joe, Thanks for bringing this up. I want to clarify this a bit. 1. Currently, the producer side logic is that if the partitioning key is not provided (i.e., it is null), the partitioner won't be called. We did that because we want to select a random and "available" partition to send messages so t

Re: Random Partitioning Issue

2013-09-14 Thread Joe Stein
How about creating a new class called RandomRefreshPartioner and copy the DefaultPartitioner code to it and then revert the DefaultPartitioner code. I appreciate this is a one time burden for folks using the existing 0.8-beta1 bumping into KAFKA-1017 in production having to switch to the Rando

Re: Random Partitioning Issue

2013-09-14 Thread Joel Koshy
> > > Thanks for bringing this up - it is definitely an important point to > discuss. The underlying issue of KAFKA-1017 was uncovered to some degree by > the fact that in our deployment we did not significantly increase the total > number of partitions over 0.7 - i.e., in 0.7 we had say four parti

Random Partitioning Issue

2013-09-13 Thread Joe Stein
First, let me apologize for not realizing/noticing this until today. One reason I left my last company was not being paid to work on Kafka nor being able to afford any time for a while to work on it. Now in my new gig (just wrapped up my first week, woo hoo) while I am still not "paid to work on K