Re: Do I need to do .collect inside forEachRDD

2017-12-07 Thread Qiao, Richard
: "Qiao, Richard" Cc: Gerard Maas , "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD Hi Richard, I had tried your sample code now and several times in the past as well. The problem seems to be kafkaProducer is not serializable. so I get "Task not seria

Re: Do I need to do .collect inside forEachRDD

2017-12-07 Thread kant kodali
r. > > > > Best Regards > > Richard > > > > > > *From: *kant kodali > *Date: *Thursday, December 7, 2017 at 2:30 AM > *To: *Gerard Maas > *Cc: *"Qiao, Richard" , "user @spark" < > user@spark.apache.org> > *Subject: *Re: Do I need

Re: Do I need to do .collect inside forEachRDD

2017-12-07 Thread Qiao, Richard
ali Date: Thursday, December 7, 2017 at 2:30 AM To: Gerard Maas Cc: "Qiao, Richard" , "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD @Richard I had pasted the two versions of the code below and I still couldn't figure out why it wouldn'

Re: Do I need to do .collect inside forEachRDD

2017-12-06 Thread kant kodali
@Richard I had pasted the two versions of the code below and I still couldn't figure out why it wouldn't work without .collect ? Any help would be great *The code below doesn't work and sometime I also run into OutOfMemory error.* jsonMessagesDStream .window(new Duration(6), new Duratio

Re: Do I need to do .collect inside forEachRDD

2017-12-06 Thread Gerard Maas
Hi Kant, > but would your answer on .collect() change depending on running the spark app in client vs cluster mode? No, it should make no difference. -kr, Gerard. On Tue, Dec 5, 2017 at 11:34 PM, kant kodali wrote: > @Richard I don't see any error in the executor log but let me run again to

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread kant kodali
@Richard I don't see any error in the executor log but let me run again to make sure. @Gerard Thanks much! but would your answer on .collect() change depending on running the spark app in client vs cluster mode? Thanks! On Tue, Dec 5, 2017 at 1:54 PM, Gerard Maas wrote: > The general answer t

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Gerard Maas
The general answer to your initial question is that "it depends". If the operation in the rdd.foreach() closure can be parallelized, then you don't need to collect first. If it needs some local context (e.g. a socket connection), then you need to do rdd.collect first to bring the data locally, whic

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Qiao, Richard
In the 2nd case, is there any producer’s error thrown in executor’s log? Best Regards Richard From: kant kodali Date: Tuesday, December 5, 2017 at 4:38 PM To: "Qiao, Richard" Cc: "user @spark" Subject: Re: Do I need to do .collect inside forEachRDD Reads from Kafka and

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread kant kodali
Reads from Kafka and outputs to Kafka. so I check the output from Kafka. On Tue, Dec 5, 2017 at 1:26 PM, Qiao, Richard wrote: > Where do you check the output result for both case? > > Sent from my iPhone > > > On Dec 5, 2017, at 15:36, kant kodali wrote: > > > > Hi All, > > > > I have a simple

Re: Do I need to do .collect inside forEachRDD

2017-12-05 Thread Qiao, Richard
Where do you check the output result for both case? Sent from my iPhone > On Dec 5, 2017, at 15:36, kant kodali wrote: > > Hi All, > > I have a simple stateless transformation using Dstreams (stuck with the old > API for one of the Application). The pseudo code is rough like this > > dstream