Hi all,

I've been digging into MapWithState code (branch 1.6), and I came across
the compute
<https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L159>
implementation in *InternalMapWithStateDStream*.

Looking at the defined partitioner
<https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/dstream/MapWithStateDStream.scala#L112>
it
looks like it could be different from the parent RDD partitioner (if
defaultParallelism() changed for instance, or input partitioning was
smaller to begin with), which will eventually create
<https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L537>
a ShuffleRDD.

Am I reading this right ?

Thanks,
Amit

Reply via email to