I have a need to route the dstream through the streming pipeline by some key, 
such that data with the same key always goes through the same executor.

There doesn't seem to be a way to do manual routing with Spark Streaming. The 
closest I can come up with is:

stream.foreachRDD {rdd =>
  rdd.groupBy(rdd.key).flatMap { line =>...}.map(...).map(...)
}

Does this do what I expect? How about between batches? Does it guarrantee the 
same key goes to the same executor in all batches?

Thanks,

Lin

Reply via email to