bq. val dist = sc.parallelize(l) Following the above, can you call, e.g. count() on dist before saving ?
Cheers On Fri, Oct 2, 2015 at 1:21 AM, jarias <ja...@elrocin.es> wrote: > Dear list, > > I'm experimenting a problem when trying to write any RDD to HDFS. I've > tried > with minimal examples, scala programs and pyspark programs both in local > and > cluster modes and as standalone applications or shells. > > My problem is that when invoking the write command, a task is executed but > it just creates an empty folder in the given HDFS path. I'm lost at this > point because there is no sign of error or warning in the spark logs. > > I'm running a seven node cluster managed by cdh5.7, spark 1.3. HDFS is > working properly when using the command tools or running MapReduce jobs. > > > Thank you for your time, I'm not sure if this is just a rookie mistake or > an > overall config problem. > > Just a working example: > > This sequence produces the following log and creates the empty folder > "test": > > scala> val l = Seq.fill(10000)(nextInt) > scala> val dist = sc.parallelize(l) > scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/") > > > 15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm > version is 1 > 15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at > <console>:27 > 15/10/02 10:19:22 INFO DAGScheduler: Got job 3 (saveAsTextFile at > <console>:27) with 2 output partitions (allowLocal=false) > 15/10/02 10:19:22 INFO DAGScheduler: Final stage: Stage 3(saveAsTextFile at > <console>:27) > 15/10/02 10:19:22 INFO DAGScheduler: Parents of final stage: List() > 15/10/02 10:19:22 INFO DAGScheduler: Missing parents: List() > 15/10/02 10:19:22 INFO DAGScheduler: Submitting Stage 3 > (MapPartitionsRDD[7] > at saveAsTextFile at <console>:27), which has no missing parents > 15/10/02 10:19:22 INFO MemoryStore: ensureFreeSpace(137336) called with > curMem=184615, maxMem=278302556 > 15/10/02 10:19:22 INFO MemoryStore: Block broadcast_3 stored as values in > memory (estimated size 134.1 KB, free 265.1 MB) > 15/10/02 10:19:22 INFO MemoryStore: ensureFreeSpace(47711) called with > curMem=321951, maxMem=278302556 > 15/10/02 10:19:22 INFO MemoryStore: Block broadcast_3_piece0 stored as > bytes > in memory (estimated size 46.6 KB, free 265.1 MB) > 15/10/02 10:19:22 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory > on nodo1.i3a.info:36330 (size: 46.6 KB, free: 265.3 MB) > 15/10/02 10:19:22 INFO BlockManagerMaster: Updated info of block > broadcast_3_piece0 > 15/10/02 10:19:22 INFO SparkContext: Created broadcast 3 from broadcast at > DAGScheduler.scala:839 > 15/10/02 10:19:22 INFO DAGScheduler: Submitting 2 missing tasks from Stage > 3 > (MapPartitionsRDD[7] at saveAsTextFile at <console>:27) > 15/10/02 10:19:22 INFO YarnScheduler: Adding task set 3.0 with 2 tasks > 15/10/02 10:19:22 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID > 6, nodo2.i3a.info, PROCESS_LOCAL, 25975 bytes) > 15/10/02 10:19:22 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID > 7, nodo3.i3a.info, PROCESS_LOCAL, 25963 bytes) > 15/10/02 10:19:22 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory > on nodo2.i3a.info:37759 (size: 46.6 KB, free: 530.2 MB) > 15/10/02 10:19:22 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory > on nodo3.i3a.info:54798 (size: 46.6 KB, free: 530.2 MB) > 15/10/02 10:19:22 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID > 6) in 312 ms on nodo2.i3a.info (1/2) > 15/10/02 10:19:23 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID > 7) in 313 ms on nodo3.i3a.info (2/2) > 15/10/02 10:19:23 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have > all completed, from pool > 15/10/02 10:19:23 INFO DAGScheduler: Stage 3 (saveAsTextFile at > <console>:27) finished in 0.334 s > 15/10/02 10:19:23 INFO DAGScheduler: Job 3 finished: saveAsTextFile at > <console>:27, took 0.436388 s > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-creates-an-empty-folder-in-HDFS-tp24906.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >