Is it possible to achieve serial batching with Spark Streaming? Example:
I configure the Streaming Context for creating a batch every 3 seconds. Processing of the batch #2 takes longer than 3 seconds and creates a backlog of batches: batch #1 takes 2s batch #2 takes 10s batch #3 takes 2s batch #4 takes 2s Whet testing locally, it seems that processing of multiple batches is finished at the same time: batch #1 finished at 2s batch #2 finished at 12s batch #3 finished at 12s (processed in parallel) batch #4 finished at 15s How can I delay processing of the next batch, so that is processed only after processing of the previous batch has been completed? batch #1 finished at 2s batch #2 finished at 12s batch #3 finished at 14s (processed serially) batch #4 finished at 16s I want to perform a transformation for every key only once in a given period of time (e.g. batch duration). I find all unique keys in a batch and perform the transformation on each key. To ensure that the transformation is done for every key only once, only one batch can be processed at a time. At the same time, I want that single batch to be processed in parallel. context = new JavaStreamingContext(conf, Durations.seconds(10)); stream = context.receiverStream(...); stream .reduceByKey(...) .transform(...) .foreachRDD(output); Any ideas or pointers are very welcome. Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org