Issue with zip and partitions

2014-04-01 Thread Patrick_Nicolas
Dell - Internal Use - Confidential I got an exception "can't zip RDDs with unusual numbers of Partitions" when I apply any action (reduce, collect) of dataset created by zipping two dataset of 10 million entries each. The problem occurs independently of the number of partitions or when I let Sp

RE: Issue with zip and partitions

2014-04-03 Thread Patrick_Nicolas
Hi Xiangrui, Thanks for your reply. This makes sense, and I should have looked at the doc.. indeed.. Zipping before saveAsFile did the trick. -Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Tuesday, April 01, 2014 11:43 PM To: user@spark.apache.org Cc: u...@spark.i