Just looking at this, what is your frequency interval ingesting ~1000
records per sec. By the rule of thumb your capacity planning should account
for twice the normal ingestion rate.

Regarding your point:

"...  Hence, ideally I'd like to increase the number of batches/records
that are being processed after a delay reaches a certain time...."

The only way you can do this is by allocating more resources to your
cluster at the start so that additional capacity is made available.

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 1 Jul 2021 at 14:28, András Kolbert <kolbertand...@gmail.com> wrote:

> After a 10 minutes delay, taking a 10 minutes batch will not take 10 times
> more than a 1-minute batch.
>
> It's mainly because of the I/O write operations to HDFS, and also because
> certain active users will be active in 1-minute batch, processing this
> customer only once (if we take 10 batches) will save time.
>
>
>
> On Thu, 1 Jul 2021 at 13:45, Sean Owen <sro...@gmail.com> wrote:
>
>> Wouldn't this happen naturally? the large batches would just take a
>> longer time to complete already.
>>
>> On Thu, Jul 1, 2021 at 6:32 AM András Kolbert <kolbertand...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have a spark streaming application which generally able to process the
>>> data within the given time frame. However, in certain hours it starts
>>> increasing that causes a delay.
>>>
>>> In my scenario, the number of input records are not linearly increase
>>> the processing time. Hence, ideally I'd like to increase the number of
>>> batches/records that are being processed after a delay reaches a certain
>>> time.
>>>
>>> Is there a possibility/settings to do so?
>>>
>>> Thanks
>>> Andras
>>>
>>>
>>> [image: image.png]
>>>
>>

Reply via email to