Basically you need to unbundle the elements of the RDD and then store them wherever you want - Use foreacPartition and then foreach
-----Original Message----- From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] Sent: Thursday, April 16, 2015 6:39 PM To: Sean Owen Cc: user@spark.apache.org Subject: Re: saveAsTextFile Thanks Sean. I want to load each batch into Redshift. What's the best/most efficient way to do that? Vadim > On Apr 16, 2015, at 1:35 PM, Sean Owen <so...@cloudera.com> wrote: > > You can't, since that's how it's designed to work. Batches are saved > in different "files", which are really directories containing > partitions, as is common in Hadoop. You can move them later, or just > read them where they are. > > On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy > <vadim.bichuts...@gmail.com> wrote: >> I am using Spark Streaming where during each micro-batch I output >> data to S3 using saveAsTextFile. Right now each batch of data is put >> into its own directory containing >> 2 objects, "_SUCCESS" and "part-00000." >> >> How do I output each batch into a common directory? >> >> Thanks, >> Vadim >> ᐧ --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org