Okay, thanks Akhil!
Suhas Shekar
University of California, Los Angeles
B.A. Economics, Specialization in Computing 2014
On Mon, Jan 12, 2015 at 1:24 PM, Akhil Das
wrote:
> There is no direct way of doing that. If you need a Single file for every
> batch duration, then you can repartition the d
There is no direct way of doing that. If you need a Single file for every
batch duration, then you can repartition the data to 1 before saving.
Another way would be to use hadoop's copy merge command/api(available from
2.0 versions)
On 13 Jan 2015 01:08, "Su She" wrote:
> Hello Everyone,
>
> Quic
Hello Everyone,
Quick followup, is there any way I can append output to one file rather
then create a new directory/file every X milliseconds?
Thanks!
Suhas Shekar
University of California, Los Angeles
B.A. Economics, Specialization in Computing 2014
On Thu, Jan 8, 2015 at 11:41 PM, Su She wr
1) Thank you everyone for the help once again...the support here is really
amazing and I hope to contribute soon!
2) The solution I actually ended up using was from this thread:
http://mail-archives.apache.org/mod_mbox/spark-user/201310.mbox/%3ccafnzj5ejxdgqju7nbdqy6xureq3d1pcxr+i2s99g5brcj5e...@m
saveAsHadoopFiles requires you to specify the output format which i believe
you are not specifying anywhere and hence the program crashes.
You could try something like this:
Class> outputFormatClass = (Class>) (Class) SequenceFileOutputFormat.class;
46
yourStream.saveAsNewAPIHadoopFiles(hdfsUrl,
Yes, I am calling the saveAsHadoopFiles on the Dstream. However, when I
call print on the Dstream it works? If I had to do foreachRDD to
saveAsHadoopFile, then why is it working for print?
Also, if I am doing foreachRDD, do I need connections, or can I simply put
the saveAsHadoopFiles inside the f
are you calling the saveAsText files on the DStream --looks like it? Look
at the section called "Design Patterns for using foreachRDD" in the link
you sent -- you want to do dstream.foreachRDD(rdd => rdd.saveAs)
On Thu, Jan 8, 2015 at 5:20 PM, Su She wrote:
> Hello Everyone,
>
> Thanks in a
Hello Everyone,
Thanks in advance for the help!
I successfully got my Kafka/Spark WordCount app to print locally. However,
I want to run it on a cluster, which means that I will have to save it to
HDFS if I want to be able to read the output.
I am running Spark 1.1.0, which means according to th