Please share the rest of your topology code (without any UDFs / business logic). Otherwise, I cannot give further advice.
-Matthias On 3/25/17 6:08 PM, Ara Ebrahimi wrote: > Via: > > builder.stream("topic1"); > builder.stream("topic2"); > builder.stream("topic3”); > > These are different kinds of topics consuming different avro objects. > > Ara. > > On Mar 25, 2017, at 6:04 PM, Matthias J. Sax > <matth...@confluent.io<mailto:matth...@confluent.io>> wrote: > > > > > ________________________________ > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Thank you in advance for your > cooperation. > > ________________________________ > > From: "Matthias J. Sax" <matth...@confluent.io<mailto:matth...@confluent.io>> > Subject: Re: more uniform task assignment across kafka stream nodes > Date: March 25, 2017 at 6:04:30 PM PDT > To: users@kafka.apache.org<mailto:users@kafka.apache.org> > Reply-To: <users@kafka.apache.org<mailto:users@kafka.apache.org>> > > > Ara, > > How do you consume your topics? Via > > builder.stream("topic1", "topic2", "topic3); > > or via > > builder.stream("topic1"); > builder.stream("topic2"); > builder.stream("topic3"); > > Both and handled differently with regard to creating tasks (partition to > task assignment also depends on you downstream code though). > > If this does not help, can you maybe share the structure of processing? > To dig deeper, we would need to know the topology DAG. > > > -Matthias > > > On 3/25/17 5:56 PM, Ara Ebrahimi wrote: > Mathias, > > This apparently happens because we have more than 1 source topic. We have 3 > source topics in the same application. So it seems like the task assignment > algorithm creates topologies not for one specific topic at a time but the > total partitions across all source topics consumed in an application > instance. Because we have some code dependencies between these 3 source > topics we can’t separate them into 3 applications at this time. Hence the > reason I want to get the task assignment algorithm basically do a uniform and > simple task assignment PER source topic. > > Ara. > > On Mar 25, 2017, at 5:21 PM, Matthias J. Sax > <matth...@confluent.io<mailto:matth...@confluent.io><mailto:matth...@confluent.io>> > wrote: > > > > > ________________________________ > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Thank you in advance for your > cooperation. > > ________________________________ > > From: "Matthias J. Sax" > <matth...@confluent.io<mailto:matth...@confluent.io><mailto:matth...@confluent.io>> > Subject: Re: more uniform task assignment across kafka stream nodes > Date: March 25, 2017 at 5:21:47 PM PDT > To: > users@kafka.apache.org<mailto:users@kafka.apache.org><mailto:users@kafka.apache.org> > Reply-To: > <users@kafka.apache.org<mailto:users@kafka.apache.org><mailto:users@kafka.apache.org>> > > > Hi, > > I am wondering why this happens in the first place. Streams, > load-balanced over all running instances, and each instance should be > the same number of tasks (and thus partitions) assigned. > > What is the overall assignment? Do you have StandyBy tasks configured? > What version do you use? > > > -Matthias > > > On 3/24/17 8:09 PM, Ara Ebrahimi wrote: > Hi, > > Is there a way to tell kafka streams to uniformly assign partitions across > instances? If I have n kafka streams instances running, I want each to handle > EXACTLY 1/nth number of partitions. No dynamic task assignment logic. Just > dumb 1/n assignment. > > Here’s our scenario. Lets say we have an “source" topic with 8 partitions. We > also have 2 kafka streams instances. Each instances get assigned to handle 4 > “source" topic partitions. BUT then we do a few maps and an aggregate. So > data gets shuffled around. The map function uniformly distributes these > across all partitions (I can verify that by looking at the partition > offsets). After the map what I notice by looking at the topology is that one > kafka streams instance get assigned to handle say 2 aggregate repartition > topics and the other one gets assigned 6. Even worse, on bigger clusters (say > 4 instances) we see say 2 nodes gets assigned downstream aggregate > repartition topics and 2 other nodes assigned NOTHING to handle. > > Ara. > > > > ________________________________ > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Thank you in advance for your > cooperation. > > ________________________________ > > > > > > > > > ________________________________ > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Thank you in advance for your > cooperation. > > ________________________________ > > > > > ________________________________ > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Thank you in advance for your > cooperation. > > ________________________________ >
signature.asc
Description: OpenPGP digital signature