Posting a comment from my previous mail post:

When data is received from a stream source, receiver creates blocks of
data.  A new block of data is generated every blockInterval milliseconds. N
blocks of data are created during the batchInterval where N =
batchInterval/blockInterval. A RDD is created on the driver for the blocks
created during the batchInterval. The blocks generated during the
batchInterval are partitions of the RDD.

Now if you want to repartition based on a key, a shuffle is needed.

On Wed, Aug 12, 2015 at 4:36 AM, Mohit Anchlia <mohitanch...@gmail.com>
wrote:

> How does partitioning in spark work when it comes to streaming? What's the
> best way to partition a time series data grouped by a certain tag like
> categories of product video, music etc.
>

Reply via email to