You can look to create a Dstream directly from S3 location using file stream. If you want to use any specific logic you can rely on Queuestream & read data yourself from S3, process it & push it into RDDQueue.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera < gianluca.privite...@studio.unibo.it> wrote: > Hi, > I've got a weird question but maybe someone has already dealt with it. > My Spark Streaming application needs to > - download a file from a S3 bucket, > - run a script with the file as input, > - create a DStream from this script output. > I've already got the second part done with the rdd.pipe() API that really > fits my request, but I have no idea how to manage the first part. > How can I manage to download a file and run a script on them inside a > Spark Streaming Application? > Should I use process() from Scala or it won't work? > > Thanks > Gianluca > >