>From API docs: "Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc. Assumes that the two RDDs have the *same number of partitions* and the *same number of elements in each partition* (e.g. one was made through a map on the other)."
Basically, one RDD should be a mapped RDD of the other, or both RDDs are mapped RDDs of the same RDD. Btw, your message says "Dell - Internal Use - Confidential"... Best, Xiangrui On Tue, Apr 1, 2014 at 7:27 PM, <patrick_nico...@dell.com> wrote: > Dell - Internal Use - Confidential > > I got an exception "can't zip RDDs with unusual numbers of Partitions" when > I apply any action (reduce, collect) of dataset created by zipping two > dataset of 10 million entries each. The problem occurs independently of the > number of partitions or when I let Spark creates those partitions. > > > > Interestingly enough, I do not have problem zipping datasets of 1 and 2.5 > million entries..... > > A similar problem was reported on this board with 0.8 but remember if the > problem was fixed. > > > > Any idea? Any workaround? > > > > I appreciate.