Re: Hash partition of key with skew

2016-05-05 Thread Srikanth
Understood. Thanks. On Wed, May 4, 2016 at 3:47 PM, Wesley Chow wrote: > > We don’t do this on the Kafka side, but for a different system that has > similar distribution problems we manually maintain a map of “hot” keys. On > the Kafka side, we distribute keys with an even distribution in our la

Re: Hash partition of key with skew

2016-05-04 Thread Wesley Chow
We don’t do this on the Kafka side, but for a different system that has similar distribution problems we manually maintain a map of “hot” keys. On the Kafka side, we distribute keys with an even distribution in our largest volume topic, and then squash the data and repartition based on a skewed

Re: Hash partition of key with skew

2016-05-04 Thread Srikanth
Yeah, fixed slicing may help. I'll put more thought into this. You had mentioned that you didn't put custom partitioner into production. Would you mind sharing how you worked around this currently? Srikanth On Tue, May 3, 2016 at 5:43 PM, Wesley Chow wrote: > > > > Upload to S3 is partitioned b

Re: Hash partition of key with skew

2016-05-04 Thread Srikanth
ter I LinkedIn I Facebook I YouTube > > > -Original Message- > From: Srikanth [mailto:srikanth...@gmail.com] > Sent: Tuesday, May 03, 2016 1:57 PM > To: users@kafka.apache.org > Subject: Re: Hash partition of key with skew > > So, there are a few consumers. One is

Re: Hash partition of key with skew

2016-05-03 Thread Wesley Chow
> > Upload to S3 is partitioned by the "key" field. I.e, one folder per key. It > does offset management to make sure offset commit is in sync with S3 upload. We do this in several spots and I wish we had built our system in such a way that we could just open source it. I’m sure many people have

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
: Re: Hash partition of key with skew So, there are a few consumers. One is a spark streaming job where we can go a partitionBy(key) and take a slight hit. There are two consumers which are just java apps. Multiple instance running in Marathon. One consumer reads records, does basic checks

Re: Hash partition of key with skew

2016-05-03 Thread Srikanth
essages as you pull them off of > Kafka? > > -Dave > > > -Original Message- > From: Srikanth [mailto:srikanth...@gmail.com] > Sent: Tuesday, May 03, 2016 12:12 PM > To: users@kafka.apache.org > Subject: Re: Hash partition of key with skew > > Jens, > T

Re: Hash partition of key with skew

2016-05-03 Thread Stephen Powis
gt; of messages. What will you do with the messages as you pull them off of > Kafka? > > -Dave > > > -Original Message- > From: Srikanth [mailto:srikanth...@gmail.com] > Sent: Tuesday, May 03, 2016 12:12 PM > To: users@kafka.apache.org > Subject: Re: Hash parti

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
them off of Kafka? -Dave -Original Message- From: Srikanth [mailto:srikanth...@gmail.com] Sent: Tuesday, May 03, 2016 12:12 PM To: users@kafka.apache.org Subject: Re: Hash partition of key with skew Jens, Thanks for the link. That is something to consider. Of course it has downsides too

Re: Hash partition of key with skew

2016-05-03 Thread Srikanth
cripts.com <http://www.surescripts.com/> | > dave.tauz...@surescripts.com <mailto:dave.tauz...@surescripts.com> > > Connect with us: Twitter I LinkedIn I Facebook I YouTube > > > > > > -Original Message- > > From: Wesley Chow [mailto:w...@

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
.3042 | www.surescripts.com | dave.tauz...@surescripts.com Connect with us: Twitter I LinkedIn I Facebook I YouTube -Original Message- From: Wesley Chow [mailto:w...@chartbeat.com] Sent: Tuesday, May 03, 2016 10:51 AM To: users@kafka.apache.org Subject: Re: Hash partition of key with skew I’m not t

Re: Hash partition of key with skew

2016-05-03 Thread Wesley Chow
Connect with us: Twitter I LinkedIn I Facebook I YouTube > > > -Original Message- > From: Wesley Chow [mailto:w...@chartbeat.com <mailto:w...@chartbeat.com>] > Sent: Tuesday, May 03, 2016 9:51 AM > To: users@kafka.apache.org <mailto:users@kafka.apache.org> > Subject: Re: Ha

RE: Hash partition of key with skew

2016-05-03 Thread Tauzell, Dave
kedIn I Facebook I YouTube -Original Message- From: Wesley Chow [mailto:w...@chartbeat.com] Sent: Tuesday, May 03, 2016 9:51 AM To: users@kafka.apache.org Subject: Re: Hash partition of key with skew I’ve come up with a couple solutions since we too have a power law distribution. Howeve

Re: Hash partition of key with skew

2016-05-03 Thread Wesley Chow
I’ve come up with a couple solutions since we too have a power law distribution. However, we have not put anything into practice. Fixed Slicing One simple thing to do is to take each key and slice it into some fixed number of partitions. So your function might be: (hash(key) % num) + (hash(key

Re: Hash partition of key with skew

2016-05-03 Thread Jens Rantil
Hi, Not sure if this helps, but the way Loggly seem to do it is to have a separate topic for "noisy neighbors". See [1]. [1] https://www.loggly.com/blog/loggly-loves-apache-kafka-use-unbreakable-messaging-better-log-management/ Cheers, Jens On Wed, Apr 27, 2016 at 9:11 PM Srikanth wrote: > He

Hash partition of key with skew

2016-04-27 Thread Srikanth
Hello, Is there a recommendation for handling producer side partitioning based on a key with skew? We want to partition on something like clientId. Problem is, this key has an uniform distribution. Its equally likely to see a key with 3k occurrence/day vs 100k/day vs 65million/day. Cardinality of