Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread varun sharma
your number of partitions. >> >> We’re doing this to scale up from 36 partitions / topic to 140 partitions >> (20 cores * 7 nodes) and it works great. >> >> -adrian >> >> From: varun sharma >> Date: Thursday, October 29, 2015 at 8:27 AM >> To:

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread Cody Koeninger
orks great. > > -adrian > > From: varun sharma > Date: Thursday, October 29, 2015 at 8:27 AM > To: user > Subject: Need more tasks in KafkaDirectStream > > Right now, there is one to one correspondence between kafka partitions and > spark partitions. > I dont have a

Re: Need more tasks in KafkaDirectStream

2015-10-29 Thread Adrian Tanase
) and it works great. -adrian From: varun sharma Date: Thursday, October 29, 2015 at 8:27 AM To: user Subject: Need more tasks in KafkaDirectStream Right now, there is one to one correspondence between kafka partitions and spark partitions. I dont have a requirement of one to one semantics. I need

Re: Need more tasks in KafkaDirectStream

2015-10-28 Thread Dibyendu Bhattacharya
If you do not need one to one semantics and does not want strict ordering guarantee , you can very well use the Receiver based approach, and this consumer from Spark-Packages ( https://github.com/dibbhatt/kafka-spark-consumer) can give much better alternatives in terms of performance and reliabilit

Need more tasks in KafkaDirectStream

2015-10-28 Thread varun sharma
Right now, there is one to one correspondence between kafka partitions and spark partitions. I dont have a requirement of one to one semantics. I need more tasks to be generated in the job so that it can be parallelised and batch can be completed fast. In the previous Receiver based approach number