Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
It's been a while. What can I do to draw some attention to this request? Is
this issue not relevant enough? Thanks for reconsideration @felixcheung @brkyvz
@zsxwing
---
If your project is set u
Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
@felixcheung will this be merged?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
Sorry for being inactive. All good with this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user arzt commented on a diff in the pull request:
https://github.com/apache/spark/pull/17774#discussion_r113871876
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala
---
@@ -617,6 +617,94 @@ class
Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
I changed the max messages per partition to be at least 1. Agreed?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
@koeninger I agree that assuming a long batch size is wrong, not sure
whether it even matters.
But what if for one partition there is no lack in the current batch? Then
fetching 1 message for
Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
To run tests or debug using IntelliJ:
`mvn test -DforkMode=never -pl external/kafka-0-8
"-Dsuites=org.apache.spark.streaming.kafka.DirectKafkaStreamSuite
maxMessagesPerPartition"`
-
Github user arzt commented on the issue:
https://github.com/apache/spark/pull/17774
Thanks for your valuable feedback. I added tests as suggested by
@JasonMWhite . @koeninger the estimated rate is per second summed over all
partitions, is it? The batch time usually is longer. So even
GitHub user arzt opened a pull request:
https://github.com/apache/spark/pull/17774
[SPARK-18371][Streaming] Spark Streaming backpressure generates batch with
large number of records
## What changes were proposed in this pull request?
Omit rounding of backpressure rate