Best practices on testing Spark jobs

2015-04-28 Thread Michal Michalski
Hi, I have two questions regarding testing Spark jobs: 1. Is it possible to use Mockito for that purpose? I tried to use it, but it looks like there are no interactions with mocks. I didn't dive into the details of how Mockito works, but I guess it might be because of the serialization and how

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
I did a quick test as I was curious about it too. I created a file with numbers from 0 to 999, in order, line by line. Then I did: scala val numbers = sc.textFile(./numbers.txt) scala val zipped = numbers.zipWithUniqueId scala zipped.foreach(i = println(i)) Expected result if the order was

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
) -Original Message- *From: *Michal Michalski [michal.michal...@boxever.com] *Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time *To: *Spico Florin *Cc: *user *Subject: *Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop? Of course after

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
stored on HDFS. Sent with Good (www.good.com) -Original Message- *From: *Michal Michalski [michal.michal...@boxever.com] *Sent: *Friday, April 24, 2015 11:18 AM Eastern Standard Time *To: *Ganelin, Ilya *Cc: *Spico Florin; user *Subject: *Re: Does HadoopRDD.zipWithIndex method

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
- will be equivalent? Kind regards, Michał Michalski, michal.michal...@boxever.com On 24 April 2015 at 16:04, Michal Michalski michal.michal...@boxever.com wrote: The problem I'm facing is that I need to process lines from input file in the order they're stored in the file, as they define

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
Of course after you do it, you probably want to call repartition(somevalue) on your RDD to get your paralellism back. Kind regards, Michał Michalski, michal.michal...@boxever.com On 24 April 2015 at 15:28, Michal Michalski michal.michal...@boxever.com wrote: I did a quick test as I was curious

Re: Does HadoopRDD.zipWithIndex method preserve the order of the input data from Hadoop?

2015-04-24 Thread Michal Michalski
Yes. Kind regards, Michał Michalski, michal.michal...@boxever.com On 24 April 2015 at 17:12, Jeetendra Gangele gangele...@gmail.com wrote: you used ZipWithUniqueID? On 24 April 2015 at 21:28, Michal Michalski michal.michal...@boxever.com wrote: I somehow missed zipWithIndex (and Sean's