Hi,
I have two questions regarding testing Spark jobs:
1. Is it possible to use Mockito for that purpose? I tried to use it, but
it looks like there are no interactions with mocks. I didn't dive into the
details of how Mockito works, but I guess it might be because of the
serialization and how
I did a quick test as I was curious about it too. I created a file with
numbers from 0 to 999, in order, line by line. Then I did:
scala val numbers = sc.textFile(./numbers.txt)
scala val zipped = numbers.zipWithUniqueId
scala zipped.foreach(i = println(i))
Expected result if the order was
)
-Original Message-
*From: *Michal Michalski [michal.michal...@boxever.com]
*Sent: *Friday, April 24, 2015 10:41 AM Eastern Standard Time
*To: *Spico Florin
*Cc: *user
*Subject: *Re: Does HadoopRDD.zipWithIndex method preserve the order of
the input data from Hadoop?
Of course after
stored on HDFS.
Sent with Good (www.good.com)
-Original Message-
*From: *Michal Michalski [michal.michal...@boxever.com]
*Sent: *Friday, April 24, 2015 11:18 AM Eastern Standard Time
*To: *Ganelin, Ilya
*Cc: *Spico Florin; user
*Subject: *Re: Does HadoopRDD.zipWithIndex method
- will be
equivalent?
Kind regards,
Michał Michalski,
michal.michal...@boxever.com
On 24 April 2015 at 16:04, Michal Michalski michal.michal...@boxever.com
wrote:
The problem I'm facing is that I need to process lines from input file in
the order they're stored in the file, as they define
Of course after you do it, you probably want to call repartition(somevalue)
on your RDD to get your paralellism back.
Kind regards,
Michał Michalski,
michal.michal...@boxever.com
On 24 April 2015 at 15:28, Michal Michalski michal.michal...@boxever.com
wrote:
I did a quick test as I was curious
Yes.
Kind regards,
Michał Michalski,
michal.michal...@boxever.com
On 24 April 2015 at 17:12, Jeetendra Gangele gangele...@gmail.com wrote:
you used ZipWithUniqueID?
On 24 April 2015 at 21:28, Michal Michalski michal.michal...@boxever.com
wrote:
I somehow missed zipWithIndex (and Sean's