Re: Issue with zip and partitions

Xiangrui Meng Tue, 01 Apr 2014 23:44:25 -0700

>From API docs: "Zips this RDD with another one, returning key-value
pairs with the first element in each RDD, second element in each RDD,
etc. Assumes that the two RDDs have the *same number of partitions*
and the *same number of elements in each partition* (e.g. one was made
through a map on the other)."


Basically, one RDD should be a mapped RDD of the other, or both RDDs
are mapped RDDs of the same RDD.

Btw, your message says "Dell - Internal Use - Confidential"...

Best,
Xiangrui

On Tue, Apr 1, 2014 at 7:27 PM,  <patrick_nico...@dell.com> wrote:
> Dell - Internal Use - Confidential
>
> I got an exception "can't zip RDDs with unusual numbers of Partitions" when
> I apply any action (reduce, collect) of dataset created by zipping two
> dataset of 10 million entries each.  The problem occurs independently of the
> number of partitions or when I let Spark creates those partitions.
>
>
>
> Interestingly enough, I do not have problem zipping datasets of 1 and 2.5
> million entries.....
>
> A similar problem was reported on this board with 0.8 but remember if the
> problem was fixed.
>
>
>
> Any idea? Any workaround?
>
>
>
> I appreciate.

Re: Issue with zip and partitions

Reply via email to