Re: spark streaming 1.3 issues

2015-07-22 Thread Shushant Arora
In spark streaming 1.3 - Say I have 10 executors each with 4 cores so in total 40 tasks in parllel at once. If I repartition kafka directstream to 40 partitions vs say I have in kafka topic 300 partitions - which one will be more efficient , Should I repartition the kafka stream equal to num of co

Re: spark streaming 1.3 issues

2015-07-21 Thread Tathagata Das
For Java, do OffsetRange[] offsetRanges = ((HasOffsetRanges)rdd*.rdd()*).offsetRanges(); If you fix that error, you should be seeing data. You can call arbitrary RDD operations on a DStream, using DStream.transform. Take a look at the docs. For the direct kafka approach you are doing, - tasks d

Re: spark streaming 1.3 issues

2015-07-21 Thread Akhil Das
I'd suggest you upgrading to 1.4 as it has better metrices and UI. Thanks Best Regards On Mon, Jul 20, 2015 at 7:01 PM, Shushant Arora wrote: > Is coalesce not applicable to kafkaStream ? How to do coalesce on > kafkadirectstream its not there in api ? > Shall calling repartition on directstrea

Re: spark streaming 1.3 issues

2015-07-20 Thread Shushant Arora
Is coalesce not applicable to kafkaStream ? How to do coalesce on kafkadirectstream its not there in api ? Shall calling repartition on directstream with number of executors as numpartitions will imrove perfromance ? Does in 1.3 tasks get launched for partitions which are empty? Does driver makes