[jira] [Updated] (KAFKA-2092) New partitioning for better load balancing

2015-08-17 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated KAFKA-2092: -- Attachment: KAFKA-2092-v3.patch Updated formatting to pass

Re: [DISCUSS] Partitioning in Kafka

2015-08-06 Thread Gianmarco De Francisci Morales
AM, Gianmarco De Francisci Morales g...@apache.org wrote: Jason, Thanks for starting the discussion and for your very concise (and correct) summary. Ewen, while what you say is true, those kinds of detasets (large number of keys with skew) are very typical in the Web (think Twitter

Re: [DISCUSS] Partitioning in Kafka

2015-07-28 Thread Gianmarco De Francisci Morales
in the context of KIP-28 which would provide some higher-level processing capabilities (though it doesn't seem like the KStream abstraction would provide a direct way to leverage this partitioner without custom logic). Thanks, Jason On Wed, Jul 22, 2015 at 12:14 AM, Gianmarco De

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-07-27 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642427#comment-14642427 ] Gianmarco De Francisci Morales commented on KAFKA-2092: --- [~hachikuji

[DISCUSS] Partitioning in Kafka

2015-07-22 Thread Gianmarco De Francisci Morales
Hello folks, I'd like to ask the community about its opinion on the partitioning functions in Kafka. With KAFKA-2091 https://issues.apache.org/jira/browse/KAFKA-2091 integrated we are now able to have custom partitioners in the producer. The question now becomes *which* partitioners should ship

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-07-21 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634927#comment-14634927 ] Gianmarco De Francisci Morales commented on KAFKA-2092: --- [hachikuji

[jira] [Updated] (KAFKA-2092) New partitioning for better load balancing

2015-07-07 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated KAFKA-2092: -- Attachment: KAFKA-2092-v2.patch Added explanation and example

Re: Review Request 35524: KAFKA-2092: New partitioning for better load balancing

2015-07-07 Thread Gianmarco De Francisci Morales
--- Thanks, Gianmarco De Francisci Morales

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-07-03 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613028#comment-14613028 ] Gianmarco De Francisci Morales commented on KAFKA-2092: --- [~hachikuji

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-07-01 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609701#comment-14609701 ] Gianmarco De Francisci Morales commented on KAFKA-2092

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-06-22 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595510#comment-14595510 ] Gianmarco De Francisci Morales commented on KAFKA-2092: --- Any more

[jira] [Comment Edited] (KAFKA-2092) New partitioning for better load balancing

2015-06-22 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589634#comment-14589634 ] Gianmarco De Francisci Morales edited comment on KAFKA-2092 at 6/22/15 8:42 AM

[jira] [Commented] (KAFKA-2092) New partitioning for better load balancing

2015-06-17 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589634#comment-14589634 ] Gianmarco De Francisci Morales commented on KAFKA-2092: --- Thanks

Review Request 35524: KAFKA-2092: New partitioning for better load balancing

2015-06-16 Thread Gianmarco De Francisci Morales
/internals/PKGPartitioner.java PRE-CREATION clients/src/main/java/org/apache/kafka/common/utils/Utils.java f73eedb Diff: https://reviews.apache.org/r/35524/diff/ Testing --- Thanks, Gianmarco De Francisci Morales

[jira] [Updated] (KAFKA-2092) New partitioning for better load balancing

2015-06-13 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated KAFKA-2092: -- Attachment: KAFKA-2092-v1.patch New partitioning for better

[jira] [Updated] (KAFKA-2092) New partitioning for better load balancing

2015-06-13 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated KAFKA-2092: -- Status: Patch Available (was: Open) First attempt at a patch

Re: [KIP-DISCUSSION] KIP-22 Expose a Partitioner interface in the new producer

2015-05-18 Thread Gianmarco De Francisci Morales
, at 02:15 AM, Gianmarco De Francisci Morales wrote: Hi, Here are the questions I think we should consider: 1. Do we need this at all given that we have the partition argument in ProducerRecord which gives full control? I think we do need it because

Re: [KIP-DISCUSSION] KIP-22 Expose a Partitioner interface in the new producer

2015-05-04 Thread Gianmarco De Francisci Morales
AM, Gianmarco De Francisci Morales wrote: Hi, Here are the questions I think we should consider: 1. Do we need this at all given that we have the partition argument in ProducerRecord which gives full control? I think we do need it because this is a way to plug in a different

Re: [KIP-DISCUSSION] KIP-22 Expose a Partitioner interface in the new producer

2015-04-24 Thread Gianmarco De Francisci Morales
Hi, Here are the questions I think we should consider: 1. Do we need this at all given that we have the partition argument in ProducerRecord which gives full control? I think we do need it because this is a way to plug in a different partitioning strategy at run time and do it in a fairly

[jira] [Commented] (KAFKA-2091) Expose a Partitioner interface in the new producer

2015-04-23 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508649#comment-14508649 ] Gianmarco De Francisci Morales commented on KAFKA-2091: --- Hi, I think

[jira] [Commented] (KAFKA-2091) Expose a Partitioner interface in the new producer

2015-04-08 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484926#comment-14484926 ] Gianmarco De Francisci Morales commented on KAFKA-2091: --- Looks good

Re: [DISCUSS] New partitioning for better load balancing

2015-04-07 Thread Gianmarco De Francisci Morales
framework? Guozhang On Sun, Apr 5, 2015 at 12:19 AM, Gianmarco De Francisci Morales g...@apache.org wrote: Hi Jay, Thanks, that sounds a necessary step. I guess I expected something like that to be already there, at least internally. I created KAFKA-2092 to track the PKG integration

[jira] [Created] (KAFKA-2092) New partitioning for better load balancing

2015-04-05 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created KAFKA-2092: - Summary: New partitioning for better load balancing Key: KAFKA-2092 URL: https://issues.apache.org/jira/browse/KAFKA-2092 Project

Re: [DISCUSS] New partitioning for better load balancing

2015-04-05 Thread Gianmarco De Francisci Morales
, I am coming from storm community. I think PKG is a very interesting and we can provide an implementation of Partitioner for PKG. Can you open a JIRA for this. -- Harsha Sent with Airmail On April 3, 2015 at 4:49:15 AM, Gianmarco De Francisci Morales ( g...@apache.org) wrote

[jira] [Updated] (KAFKA-2092) New partitioning for better load balancing

2015-04-05 Thread Gianmarco De Francisci Morales (JIRA)
[ https://issues.apache.org/jira/browse/KAFKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated KAFKA-2092: -- Description: We have recently studied the problem of load

[DISCUSS] New partitioning for better load balancing

2015-04-03 Thread Gianmarco De Francisci Morales
Hi, We have recently studied the problem of load balancing in distributed stream processing systems such as Samza [1]. In particular, we focused on what happens when the key distribution of the stream is skewed when using key grouping. We developed a new stream partitioning scheme (which we call