I think I understand where the bug is now. I created a JIRA (https://issues.apache.org/jira/browse/SPARK-4433) and will make a PR soon. -Xiangrui
On Sat, Nov 15, 2014 at 7:39 PM, Xiangrui Meng <men...@gmail.com> wrote: > This is a bug. Could you make a JIRA? -Xiangrui > > On Sat, Nov 15, 2014 at 3:27 AM, lev <kat...@gmail.com> wrote: >> Hi, >> >> I'm having trouble using both zipWithIndex and repartition. When I use them >> both, the following action will get stuck and won't return. >> I'm using spark 1.1.0. >> >> >> Those 2 lines work as expected: >> >> scala> sc.parallelize(1 to 10).repartition(10).count() >> res0: Long = 10 >> >> scala> sc.parallelize(1 to 10).zipWithIndex.count() >> res1: Long = 10 >> >> >> But this statement get stuck and doesn't return: >> >> scala> sc.parallelize(1 to 10).zipWithIndex.repartition(10).count() >> 14/11/15 03:18:55 INFO spark.SparkContext: Starting job: apply at >> Option.scala:120 >> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Got job 3 (apply at >> Option.scala:120) with 3 output partitions (allowLocal=false) >> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Final stage: Stage 4(apply at >> Option.scala:120) >> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Parents of final stage: >> List() >> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Missing parents: List() >> 14/11/15 03:18:55 INFO scheduler.DAGScheduler: Submitting Stage 4 >> (ParallelCollectionRDD[7] at parallelize at <console>:13), which has no >> missing parents >> 14/11/15 03:18:55 INFO storage.MemoryStore: ensureFreeSpace(1096) called >> with curMem=7616, maxMem=138938941 >> 14/11/15 03:18:55 INFO storage.MemoryStore: Block broadcast_4 stored as >> values in memory (estimated size 1096.0 B, free 132.5 MB) >> >> >> Am I doing something wrong here or is it a bug? >> Is there some work around? >> >> Thanks, >> Lev. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/repartition-combined-with-zipWithIndex-get-stuck-tp18999.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org