Re: save spark streaming output to single file on hdfs

2015-01-15 Thread Prannoy
Hi, You can use FileUtil.copyMerge API and specify the path to the folder where saveAsTextFile is save the part text file. Suppose your directory is /a/b/c/ use FileUtil.copyMerge(FileSystem of source, a/b/c, FileSystem of destination, Path to the merged file say (a/b/c.txt), true(to delete the

Re: save spark streaming output to single file on hdfs

2015-01-15 Thread jamborta
thanks for the replies. very useful. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/save-spark-streaming-output-to-single-file-on-hdfs-tp21124p21176.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Tamas Jambor
Thanks. The problem is that we'd like it to be picked up by hive. On Tue Jan 13 2015 at 18:15:15 Davies Liu dav...@databricks.com wrote: On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote: Hi all, Is there a way to save dstream RDDs to a single file so that another

Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Davies Liu
On Tue, Jan 13, 2015 at 10:04 AM, jamborta jambo...@gmail.com wrote: Hi all, Is there a way to save dstream RDDs to a single file so that another process can pick it up as a single RDD? It does not need to a single file, Spark can pick any directory as a single RDD. Also, it's easy to union

Re: save spark streaming output to single file on hdfs

2015-01-13 Thread Davies Liu
Right now, you couldn't. You could load each file as a partition into Hive, or you need to pack the files together by other tools or spark job. On Tue, Jan 13, 2015 at 10:35 AM, Tamas Jambor jambo...@gmail.com wrote: Thanks. The problem is that we'd like it to be picked up by hive. On Tue Jan