[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21685 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-31 Thread sidhavratha
Github user sidhavratha commented on the issue: https://github.com/apache/spark/pull/21685 Our kafka team have resolved issue regarding 40 sec poll delay, due to some faulty hardware. However, these changes still make sense to get better throughput per batch. As you know kafka

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-10 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21685 In the meantime came something into my mind (the most obvious question). What is the size of kafka events which is processed? Big events could end up in high polling time. Maybe some

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-02 Thread sidhavratha
Github user sidhavratha commented on the issue: https://github.com/apache/spark/pull/21685 And yes, both application are tested on same dataset, with only additional buffer logic applied, and consumer group-id changed. In before case scheduling delay is increasing because of

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-02 Thread sidhavratha
Github user sidhavratha commented on the issue: https://github.com/apache/spark/pull/21685 If batch duration is 10 second, every 10 second 1 new batch will start irrespective of last batch was completed or not. If a particular batch (10 second duration - which is supposed to

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-02 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21685 What I can't really understand is why the `Scheduler Delay` is so different. ` Scheduler delay includes time to ship the task from the scheduler to the executor, and time to send

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-02 Thread sidhavratha
Github user sidhavratha commented on the issue: https://github.com/apache/spark/pull/21685 Thanks a lot for looking into this. Please find comments in [] below every points. - You're trying to commit something into 2.4 but in the test result I see with 2.1.0 version. Have

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-02 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21685 In general `KafkaConsumer.poll` should take couple of seconds but 10+ is extreme high. The question `why it takes so long?` has to be answered first. In the processing time chart I see a

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-01 Thread sidhavratha
Github user sidhavratha commented on the issue: https://github.com/apache/spark/pull/21685 @gaborgsomogyi Can you please review this PR and approve for test. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21685 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21685 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-06-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21685 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional