[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-167808178 That yahoo benchmark has a lot of issues, they've already been contacted by myself and others as to some obvious errors they made in their spark job. On

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-28 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-167704274 I would like to close this now. But the latency should be a problem in real use case. You can see a

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-28 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/10197 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164215971 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164215964 **[Test build #47620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47620/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164215968 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164228480 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164228478 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164218581 **[Test build #47621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47621/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164215857 **[Test build #47620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47620/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164228455 **[Test build #47621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47621/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164172702 I refactored it to reuse most of current codes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread viirya
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/10197 [SPARK-12203][STREAMING] Add KafkaDirectInputDStream JIRA: https://issues.apache.org/jira/browse/SPARK-12203 Currently, we have DirectKafkaInputDStream, which directly pulls messages

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164173864 **[Test build #47614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47614/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164174224 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164174221 **[Test build #47614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47614/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164174223 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164210138 **[Test build #47619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47619/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164212134 **[Test build #47619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47619/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164212137 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164212136 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164204167 Forgot to commit new file... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164204585 **[Test build #47617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47617/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164204636 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164204635 **[Test build #47617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47617/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-164204637 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163302006 The reason direct stream has some latency is because it is figuring out, in advance, on the driver, which offsets are in each partition. That means that all

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-09 Thread viirya
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/10197 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-09 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163313509 As I can see from the implementation, the reason direct stream has some latency is because it is going to generate the rdd after each batch window finishes. So it

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread viirya
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/10197 [SPARK-12203][STREAMING] Add KafkaDirectInputDStream JIRA: https://issues.apache.org/jira/browse/SPARK-12203 Currently, we have DirectKafkaInputDStream, which directly pulls messages from

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162840627 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162840628 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162840624 **[Test build #47329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47329/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162840417 **[Test build #47329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47329/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162904331 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162904333 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162934829 **[Test build #47340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47340/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-162931179 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163009606 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163009506 **[Test build #47340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47340/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163009604 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163073031 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163073033 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163064821 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163072940 **[Test build #47376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47376/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163067069 **[Test build #47376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47376/consoleFull)** for PR 10197 at commit

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163078191 Hi @viirya , I really don't see any difference compared to receiver based Kafka stream, the only difference is that you change the high-level consumer API to

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163092418 We need the exactly once feature of DirectKafkaInputDStream. But we observed that it introduces the latency compared with KafkaInputDStream due to its implementation.

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread jerryshao
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163096778 I'm really doubt changing to low-level API could guarantee exact-once semantics without any other changes. CC\ @koeninger --- If your project is set up

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163102160 Hmm, I think its exactly-once semantics should be as same as what DirectKafkaInputDStream does. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-08 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163088878 @jerryshao Yes. We would like to directly pull messages from Kafka like DirectKafkaInputDStream, but also use receivers like KafkaInputDStream. --- If your project is