Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
Our kafka team have resolved issue regarding 40 sec poll delay, due to some
faulty hardware.
However, these changes still make sense to get better throughput per batch.
As you know kafka
Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
And yes, both application are tested on same dataset, with only additional
buffer logic applied, and consumer group-id changed.
In before case scheduling delay is increasing because
Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
If batch duration is 10 second, every 10 second 1 new batch will start
irrespective of last batch was completed or not.
If a particular batch (10 second duration - which is supposed
Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
Thanks a lot for looking into this. Please find comments in [] below every
points.
- You're trying to commit something into 2.4 but in the test result I see
with 2.1.0 version. Have
Github user sidhavratha commented on the issue:
https://github.com/apache/spark/pull/21685
@gaborgsomogyi Can you please review this PR and approve for test.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user sidhavratha opened a pull request:
https://github.com/apache/spark/pull/21685
[SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to maintain min â¦
â¦buffer using async thread to avoid blocking kafka poll
## What changes were proposed in this pull request