Write to hdfs and then get one file locally bu using "hdfs dfs -getmerge..."
On Friday, October 17, 2014, Sean Owen <so...@cloudera.com> wrote: > You can save to a local file. What are you trying and what doesn't work? > > You can output one file by repartitioning to 1 partition but this is > probably not a good idea as you are bottlenecking the output and some > upstream computation by disabling parallelism. > > How about just combining the files on HDFS afterwards? or just reading > all the files instead of 1? You can hdfs dfs -cat a bunch of files at > once. > > On Fri, Oct 17, 2014 at 6:46 PM, Parthus <peng.wei....@gmail.com > <javascript:;>> wrote: > > Hi, > > > > I have a spark mapreduce task which requires me to write the final rdd > to an > > existing local file (appending to this file). I tried two ways but > neither > > works well: > > > > 1. use saveAsTextFile() api. Spark 1.1.0 claims that this API can write > to > > local, but I never make it work. Moreover, the result is not one file > but a > > series of part-xxxxx files which is not what I hope to get. > > > > 2. collect the rdd to an array and write it to the driver node using > Java's > > File IO. There are also two problems: 1) my RDD is huge(1TB), which > cannot > > fit into the memory of one driver node. I have to split the task into > small > > pieces and collect them part by part and write; 2) During the writing by > > Java IO, the Spark Mapreduce task has to wait, which is not efficient. > > > > Could anybody provide me an efficient way to solve this problem? I wish > that > > the solution could be like: appending a huge rdd to a local file without > > pausing the MapReduce during writing? > > > > > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-write-a-RDD-into-One-Local-Existing-File-tp16720.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <javascript:;> > > For additional commands, e-mail: user-h...@spark.apache.org > <javascript:;> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org <javascript:;> > For additional commands, e-mail: user-h...@spark.apache.org <javascript:;> > > -- - Rishi