Hi,

As I understand, a DStream consists of 1 or more RDDs. And foreachRDD will run 
a given func on each and every RDD inside a DStream.

I created a simple program which reads log files from a folder every hour:
JavaStreamingContext stcObj = new JavaStreamingContext(confObj, new Duration(60 
* 60 * 1000)); //1 hour
JavaDStream<String> obj = stcObj.textFileStream("/Users/path/to/Input");

When the interval is reached, Spark reads all the files and creates one and 
only one RDD (as i verified from a sysout inside foreachRDD).

The streaming doc at a lot of places gives an indication that many operations 
(e.g. flatMap) on a DStream are applied individually to a RDD and the resulting 
DStream consists of the mapped RDDs in the same number as the input DStream.
ref: 
https://spark.apache.org/docs/latest/streaming-programming-guide.html#dstreams

If that is the case, how can i generate a scenario where in I have multiple 
RDDs inside a DStream in my example ?

Regards,
Sanjay

Reply via email to