After a 10 minutes delay, taking a 10 minutes batch will not take 10 times more than a 1-minute batch.
It's mainly because of the I/O write operations to HDFS, and also because certain active users will be active in 1-minute batch, processing this customer only once (if we take 10 batches) will save time. On Thu, 1 Jul 2021 at 13:45, Sean Owen <sro...@gmail.com> wrote: > Wouldn't this happen naturally? the large batches would just take a longer > time to complete already. > > On Thu, Jul 1, 2021 at 6:32 AM András Kolbert <kolbertand...@gmail.com> > wrote: > >> Hi, >> >> I have a spark streaming application which generally able to process the >> data within the given time frame. However, in certain hours it starts >> increasing that causes a delay. >> >> In my scenario, the number of input records are not linearly increase the >> processing time. Hence, ideally I'd like to increase the number of >> batches/records that are being processed after a delay reaches a certain >> time. >> >> Is there a possibility/settings to do so? >> >> Thanks >> Andras >> >> >> [image: image.png] >> >