Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-167808178
That yahoo benchmark has a lot of issues, they've already been contacted by
myself and others as to some obvious errors they made in their spark job.
On
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-167704274
I would like to close this now. But the latency should be a problem in real
use case. You can see a
Github user viirya closed the pull request at:
https://github.com/apache/spark/pull/10197
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164215971
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164215964
**[Test build #47620 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47620/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164215968
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164228480
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164228478
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164218581
**[Test build #47621 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47621/consoleFull)**
for PR 10197 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164215857
**[Test build #47620 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47620/consoleFull)**
for PR 10197 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164228455
**[Test build #47621 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47621/consoleFull)**
for PR 10197 at commit
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164172702
I refactored it to reuse most of current codes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user viirya reopened a pull request:
https://github.com/apache/spark/pull/10197
[SPARK-12203][STREAMING] Add KafkaDirectInputDStream
JIRA: https://issues.apache.org/jira/browse/SPARK-12203
Currently, we have DirectKafkaInputDStream, which directly pulls messages
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164173864
**[Test build #47614 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47614/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164174224
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164174221
**[Test build #47614 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47614/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164174223
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164210138
**[Test build #47619 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47619/consoleFull)**
for PR 10197 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164212134
**[Test build #47619 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47619/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164212137
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164212136
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164204167
Forgot to commit new file...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164204585
**[Test build #47617 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47617/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164204636
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164204635
**[Test build #47617 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47617/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-164204637
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163302006
The reason direct stream has some latency is because it is figuring out, in
advance, on the driver, which offsets are in each partition. That means that
all
Github user viirya closed the pull request at:
https://github.com/apache/spark/pull/10197
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163313509
As I can see from the implementation, the reason direct stream has some
latency is because it is going to generate the rdd after each batch window
finishes. So it
GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/10197
[SPARK-12203][STREAMING] Add KafkaDirectInputDStream
JIRA: https://issues.apache.org/jira/browse/SPARK-12203
Currently, we have DirectKafkaInputDStream, which directly pulls messages
from
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162840627
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162840628
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162840624
**[Test build #47329 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47329/consoleFull)**
for PR 10197 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162840417
**[Test build #47329 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47329/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162904331
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162904333
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162934829
**[Test build #47340 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47340/consoleFull)**
for PR 10197 at commit
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-162931179
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163009606
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163009506
**[Test build #47340 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47340/consoleFull)**
for PR 10197 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163009604
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163073031
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163073033
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163064821
retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163072940
**[Test build #47376 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47376/consoleFull)**
for PR 10197 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163067069
**[Test build #47376 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47376/consoleFull)**
for PR 10197 at commit
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163078191
Hi @viirya , I really don't see any difference compared to receiver based
Kafka stream, the only difference is that you change the high-level consumer
API to
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163092418
We need the exactly once feature of DirectKafkaInputDStream. But we
observed that it introduces the latency compared with KafkaInputDStream due to
its implementation.
Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163096778
I'm really doubt changing to low-level API could guarantee exact-once
semantics without any other changes.
CC\ @koeninger
---
If your project is set up
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163102160
Hmm, I think its exactly-once semantics should be as same as what
DirectKafkaInputDStream does.
---
If your project is set up for it, you can reply to this email and
Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/10197#issuecomment-163088878
@jerryshao Yes. We would like to directly pull messages from Kafka like
DirectKafkaInputDStream, but also use receivers like KafkaInputDStream.
---
If your project is
51 matches
Mail list logo