Re: listening to recursive folder structures in s3 using pyspark streaming (textFileStream)

Shixiong(Ryan) Zhu Wed, 17 Feb 2016 14:48:07 -0800

textFileStream doesn't support that. It only supports monitoring one folder.


On Wed, Feb 17, 2016 at 7:20 AM, in4maniac <sa...@skimlinks.com> wrote:

> Hi all,
>
> I am new to pyspark streaming and I was following a tutorial I saw in the
> internet
> (
> https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py
> ).
> But I replaced the data input with an s3 directory path as:
>
> lines = ssc.textFileStream("s3n://bucket/first/second/third1/")
>
> When I run the code and upload a file to s3n://bucket/first/second/third1/
> (such as s3n://bucket/first/second/third1/test1.txt), the file gets
> processed as expected.
>
> Now I want it to listen to multiple directories and process files if they
> get uploaded to any of the directories:
> for example : [s3n://bucket/first/second/third1/,
> s3n://bucket/first/second/third2/ and s3n://bucket/first/second/third3/]
>
> I tried to use the pattern similar to sc.TextFile as :
>
> lines = ssc.textFileStream("s3n://bucket/first/second/*/")
>
> But this actually didn't work. Can someone please explain to me how I could
> achieve my objective?
>
> thanks in advance !!!
>
> in4maniac
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/listening-to-recursive-folder-structures-in-s3-using-pyspark-streaming-textFileStream-tp26247.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: listening to recursive folder structures in s3 using pyspark streaming (textFileStream)

Reply via email to