OK, the general consensus seems to be that more elaborate partitioning
functions belong to the scope of Kafka.
Could somebody have a look at KAFKA-2092
https://issues.apache.org/jira/browse/KAFKA-2092 then?
--
Gianmarco
On 30 July 2015 at 05:57, Jiangjie Qin j...@linkedin.com.invalid wrote:
Just my two cents. I think it might be OK to put this into Kafka if we
agree that this might be a good use case for people who wants to use Kafka
as temporary store for stream processing. At very least I don't see down
side on this.
Thanks,
Jiangjie (Becket) Qin
On Tue, Jul 28, 2015 at 3:41 AM,
Jason,
Thanks for starting the discussion and for your very concise (and correct)
summary.
Ewen, while what you say is true, those kinds of detasets (large number of
keys with skew) are very typical in the Web (think Twitter users, or Web
pages, or even just plain text).
If you want to compute an
If you are used to map-reduce patterns, this sounds like a perfectly
natural way to process streams of data.
Call the first consumer map-combine-log, the topic shuffle-log and
the second consumer reduce-log :)
I like that a lot. It works well for either embarrassingly parallel
cases, or so much
For a little background, the difference between this partitioner and the
default one is that it breaks the deterministic mapping from key to
partition. Instead, messages for a given key can end up in either of two
partitions. This means that the consumer generally won't see all messages
for a
Gwen - this is really like two steps of map reduce though, right? The first
step does the partial shuffle to two partitions per key, second step does
partial reduce + final full shuffle, final step does the final reduce.
This strikes me as similar to partition assignment strategies in the
I guess it depends on whether the original producer did any map
tasks or simply wrote raw data. We usually advocate writing raw data,
and since we need to write it anyway, the partitioner doesn't
introduce any extra hops.
Its definitely useful to look at use-cases and I need to think a bit
more
Hello folks,
I'd like to ask the community about its opinion on the partitioning
functions in Kafka.
With KAFKA-2091 https://issues.apache.org/jira/browse/KAFKA-2091
integrated we are now able to have custom partitioners in the producer.
The question now becomes *which* partitioners should ship