Re: Appending to an hdfs file

2015-01-29 Thread Matan Safriel
Thanks. I actually looked up foreachPartition() in this context yesterday, and couldn't land where it's documented in Javadocs or elsewhere.. probably for some silly reason. Can you please point me in the right direction? Many thanks! By the way, I realize the solution should rather be to

Appending to an hdfs file

2015-01-28 Thread Matan Safriel
Hi, Is it possible to append to an existing (hdfs) file, through some Spark action? Should there be any reason not to use a hadoop append api within a Spark job? Thanks, Matan

Re: Appending to an hdfs file

2015-01-28 Thread Sean Owen
You can call any API you like in a Spark job, as long as the libraries are available, and Hadoop HDFS APIs will be available from the cluster. You could write a foreachPartition() that appends partitions of data to files, yes. Spark itself does not use appending. I think the biggest reason is