Re: Partitioning in spark streaming

Mohit Anchlia Tue, 11 Aug 2015 21:54:13 -0700

Thanks for the info. When data is written in hdfs how does spark keeps the
filenames written by multiple executors unique


On Tue, Aug 11, 2015 at 9:35 PM, Hemant Bhanawat <hemant9...@gmail.com>
wrote:

> Posting a comment from my previous mail post:
>
> When data is received from a stream source, receiver creates blocks of
> data.  A new block of data is generated every blockInterval milliseconds. N
> blocks of data are created during the batchInterval where N =
> batchInterval/blockInterval. A RDD is created on the driver for the blocks
> created during the batchInterval. The blocks generated during the
> batchInterval are partitions of the RDD.
>
> Now if you want to repartition based on a key, a shuffle is needed.
>
> On Wed, Aug 12, 2015 at 4:36 AM, Mohit Anchlia <mohitanch...@gmail.com>
> wrote:
>
>> How does partitioning in spark work when it comes to streaming? What's
>> the best way to partition a time series data grouped by a certain tag like
>> categories of product video, music etc.
>>
>
>

Re: Partitioning in spark streaming

Reply via email to