In several situations I would like to zip RDDs knowing that their order 
matches. In particular I’m using an MLLib KMeansModel on an RDD of Vectors so I 
would like to do:

Also the first column in my RDD is a timestamp which I don’t want to be a part 
of the model, so in fact I would like to split the first column out of my RDD, 
then do:

Moreover I’d like my data to be scaled and go through a principal component 
analysis first, so the main steps would be like:

val noTs =
val scaled = scaler.transform(noTs)
val projected = (new RowMatrix(scaled)).multiply(principalComponents).rows
val clusters = myModel.predict(projected)
val result =

Do you think there’s a chance that the 4 transformations above would preserve 
order so the zip at the end would be correct?

On 2017-09-13 19:51 CEST, wrote :

I'm wondering why you need order preserved, we've had situations where keeping 
the source as an artificial field in the dataset was important and I had to run 
contortions to inject that (In this case the datasource had no unique key).

Is this similar?

On 13 September 2017 at 10:46, Suzen, Mehmet 

But what happens if one of the partitions fail, how fault tolarence recover 
elements in other partitions.

On 13 Sep 2017 18:39, "Ankit Maloo" 

AFAIK, the order of a rdd is maintained across a partition for Map operations. 
There is no way a map operation  can change sequence across a partition as 
partition is local and computation happens one record at a time.

On 13-Sep-2017 9:54 PM, "Suzen, Mehmet" 
I think the order has no meaning in RDDs see this post, specially zip methods:

