textFileStream doesn't support that. It only supports monitoring one folder.
On Wed, Feb 17, 2016 at 7:20 AM, in4maniac <sa...@skimlinks.com> wrote: > Hi all, > > I am new to pyspark streaming and I was following a tutorial I saw in the > internet > ( > https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py > ). > But I replaced the data input with an s3 directory path as: > > lines = ssc.textFileStream("s3n://bucket/first/second/third1/") > > When I run the code and upload a file to s3n://bucket/first/second/third1/ > (such as s3n://bucket/first/second/third1/test1.txt), the file gets > processed as expected. > > Now I want it to listen to multiple directories and process files if they > get uploaded to any of the directories: > for example : [s3n://bucket/first/second/third1/, > s3n://bucket/first/second/third2/ and s3n://bucket/first/second/third3/] > > I tried to use the pattern similar to sc.TextFile as : > > lines = ssc.textFileStream("s3n://bucket/first/second/*/") > > But this actually didn't work. Can someone please explain to me how I could > achieve my objective? > > thanks in advance !!! > > in4maniac > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/listening-to-recursive-folder-structures-in-s3-using-pyspark-streaming-textFileStream-tp26247.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >