Re: Spark Streaming, download a s3 file to run a script shell on it

Gianluca Privitera Fri, 06 Jun 2014 08:43:35 -0700

Where are the API for QueueStream and RddQueue?

In my solution I cannot open a DStream with S3 location because I needto run a script on the file (a script that unluckily doesn't acceptstdin as input), so I have to download it on my disk somehow than handleit from there before creating the stream.


Thanks
Gianluca

On 06/06/2014 02:19, Mayur Rustagi wrote:

You can look to create a Dstream directly from S3 location using filestream. If you want to use any specific logic you can rely onQueuestream & read data yourself from S3, process it & push it intoRDDQueue.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>
On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera<gianluca.privite...@studio.unibo.it<mailto:gianluca.privite...@studio.unibo.it>> wrote:
    Hi,
    I've got a weird question but maybe someone has already dealt with it.
    My Spark Streaming application needs to
    - download a file from a S3 bucket,
    - run a script with the file as input,
    - create a DStream from this script output.
    I've already got the second part done with the rdd.pipe() API that
    really fits my request, but I have no idea how to manage the first
    part.
    How can I manage to download a file and run a script on them
    inside a Spark Streaming Application?
    Should I use process() from Scala or it won't work?

    Thanks
    Gianluca

Re: Spark Streaming, download a s3 file to run a script shell on it

Reply via email to