Just copy the files? it shouldn't matter that much where they are as you can find them easily. Or consider somehow sending the batches of data straight into Redshift? no idea how that is done but I imagine it's doable.
On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy <vadim.bichuts...@gmail.com> wrote: > Thanks Sean. I want to load each batch into Redshift. What's the best/most > efficient way to do that? > > Vadim > > >> On Apr 16, 2015, at 1:35 PM, Sean Owen <so...@cloudera.com> wrote: >> >> You can't, since that's how it's designed to work. Batches are saved >> in different "files", which are really directories containing >> partitions, as is common in Hadoop. You can move them later, or just >> read them where they are. >> >> On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy >> <vadim.bichuts...@gmail.com> wrote: >>> I am using Spark Streaming where during each micro-batch I output data to S3 >>> using >>> saveAsTextFile. Right now each batch of data is put into its own directory >>> containing >>> 2 objects, "_SUCCESS" and "part-00000." >>> >>> How do I output each batch into a common directory? >>> >>> Thanks, >>> Vadim >>> ᐧ --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org