That is a good question. If I understand correctly, you need multiple RDDs from a DStream in *every batch*. Can you elaborate on why do you need multiple RDDs every batch?
TD On Wed, Mar 19, 2014 at 10:20 PM, Sanjay Awatramani <sanjay_a...@yahoo.com>wrote: > Hi, > > As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will > run a given func on each and every RDD inside a DStream. > > I created a simple program which reads log files from a folder every hour: > JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new > Duration(60 * 60 * 1000)); //1 hour > JavaDStream<String> obj = stcObj.textFileStream("/Users/path/to/Input"); > > When the interval is reached, Spark reads all the files and creates one > and only one RDD (as i verified from a sysout inside foreachRDD). > > The streaming doc at a lot of places gives an indication that many > operations (e.g. flatMap) on a DStream are applied individually to a RDD > and the resulting DStream consists of the mapped RDDs in the same number as > the input DStream. > ref: > https://spark.apache.org/docs/latest/streaming-programming-guide.html#dstreams > > If that is the case, how can i generate a scenario where in I have > multiple RDDs inside a DStream in my example ? > > Regards, > Sanjay >