Github user jeanlyn closed the pull request at:
https://github.com/apache/spark/pull/11440
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jeanlyn commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190994425
Thanks @jerryshao @srowen @zsxwing for suggestions.I close this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/11440#discussion_r54610922
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
---
@@ -221,8 +221,12 @@ class JobGenerator(jobScheduler:
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/11440#discussion_r54551767
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
---
@@ -221,8 +221,12 @@ class JobGenerator(jobScheduler:
Github user jeanlyn commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190613342
My bad. I will try to figure out the way to fix the when window operations
appear with the config set to true.
---
If your project is set up for it, you can reply to
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190610110
But how do you define "much longer", based on the batch number or time?
IMHO we cannot fix a patch based on the assumptions. We should add some
defensive codes to
Github user jeanlyn commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190608101
@jerryshao Thanks for the explanation. I see what you mean. It's only
happen in the beginning, and if the stop time is much longer than the window
time, i think it's
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190570144
For example, if your sliding duration is 1, window duration is 4, and batch
duration is 1, and the down time is 3. If you skip this this 3 batches, IIUC
the result
Github user jeanlyn commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190568465
Thanks @jerryshao for suggestion!
> Jobs generated in the down time can be used for WAL replay, did you test
when these down jobs are removed, the behavior of WAL
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190531231
Also for some windowing operations, I think this removal of down time jobs
may possibly lead to the inconsistent result of windowing aggregation.
---
If your
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/11440#issuecomment-190530543
Jobs generated in the down time can be used for WAL replay, did you test
when these down jobs are removed, the behavior of WAL replay is still correct?
---
If your
11 matches
Mail list logo