After a 10 minutes delay, taking a 10 minutes batch will not take 10 times
more than a 1-minute batch.

It's mainly because of the I/O write operations to HDFS, and also because
certain active users will be active in 1-minute batch, processing this
customer only once (if we take 10 batches) will save time.



On Thu, 1 Jul 2021 at 13:45, Sean Owen <sro...@gmail.com> wrote:

> Wouldn't this happen naturally? the large batches would just take a longer
> time to complete already.
>
> On Thu, Jul 1, 2021 at 6:32 AM András Kolbert <kolbertand...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a spark streaming application which generally able to process the
>> data within the given time frame. However, in certain hours it starts
>> increasing that causes a delay.
>>
>> In my scenario, the number of input records are not linearly increase the
>> processing time. Hence, ideally I'd like to increase the number of
>> batches/records that are being processed after a delay reaches a certain
>> time.
>>
>> Is there a possibility/settings to do so?
>>
>> Thanks
>> Andras
>>
>>
>> [image: image.png]
>>
>

Reply via email to