Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
Is someRDD.saveAsTextFile("hdfs://ip:port/path/final_filename.txt") supposed to work? Meaning, can I save files to the HDFS fs this way? I tried: val r = sc.parallelize(List(1,2,3,4,5,6,7,8)) r.saveAsTextFile("hdfs://ip:port/path/file.txt") and it is just hanging. At the same time on my HDFS i

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
Hmm. Strange. Even the below hangs. val r = sc.parallelize(List(1,2,3,4,5,6,7,8)) r.count I then looked at the web UI at port 8080 and realized that the spark shell is in WAITING status since another job is running on the standalone cluster. This may sound like a very stupid question but my e

Re: Writing RDDs to HDFS

2014-03-24 Thread Diana Carroll
Ongen: I don't know why your process is hanging, sorry. But I do know that the way saveAsTextFile works is that you give it a path to a directory, not a file. The "file" is saved in multiple parts, corresponding to the partitions. (part-0, part-1 etc.) (Presumably it does this because i

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
Diana, thanks. I am not very well acquainted with HDFS. I use hdfs -put to put things as files into the filesystem (and sc.textFile to get stuff out of them in Spark) and I see that they appear to be saved as files that are replicated across 3 out of the 16 nodes in the hdfs cluster (which is m

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
Just so I can close this thread (in case anyone else runs into this stuff) - I did sleep through the basics of Spark ;). The answer on why my job is in waiting state (hanging) is here: http://spark.incubator.apache.org/docs/latest/spark-standalone.html#resource-scheduling Ognen On 3/24/14, 5:

Re: Writing RDDs to HDFS

2014-03-24 Thread Yana Kadiyska
Ognen, can you comment if you were actually able to run two jobs concurrently with just restricting spark.cores.max? I run Shark on the same cluster and was not able to see a standalone job get in (since Shark is a "long running" job) until I restricted both spark.cores.max _and_ spark.executor.mem

Re: Writing RDDs to HDFS

2014-03-25 Thread Ognen Duzlevski
Well, my long running app has 512M per executor on a 16 node cluster where each machine has 16G of RAM. I could not run a second application until I restricted the spark.cores.max. As soon as I restricted the cores, I am able to run a second job at the same time. Ognen On 3/24/14, 7:46 PM, Ya

Writing RDDs to HDFS is empty

2019-01-07 Thread Jian Lee
Hi all, In my experiment program,I used spark Graphx, when running on the Idea in windows,the result is right, but when runing on the linux distributed cluster,the result in hdfs is empty, why?how to solve?