You can look to create a Dstream directly from S3 location using file
stream. If you want to use any specific logic you can rely on Queuestream &
read data yourself from S3, process it & push it into RDDQueue.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera <
gianluca.privite...@studio.unibo.it> wrote:

> Hi,
> I've got a weird question but maybe someone has already dealt with it.
> My Spark Streaming application needs to
> - download a file from a S3 bucket,
> - run a script with the file as input,
> - create a DStream from this script output.
> I've already got the second part done with the rdd.pipe() API that really
> fits my request, but I have no idea how to manage the first part.
> How can I manage to download a file and run a script on them inside a
> Spark Streaming Application?
> Should I use process() from Scala or it won't work?
>
> Thanks
> Gianluca
>
>

Reply via email to