I have a need to route the dstream through the streming pipeline by some key, such that data with the same key always goes through the same executor.
There doesn't seem to be a way to do manual routing with Spark Streaming. The closest I can come up with is: stream.foreachRDD {rdd => rdd.groupBy(rdd.key).flatMap { line =>...}.map(...).map(...) } Does this do what I expect? How about between batches? Does it guarrantee the same key goes to the same executor in all batches? Thanks, Lin