[2/2] spark git commit: [SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 Consumer API

2016-06-30 Thread tdas
[SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 Consumer API ## What changes were proposed in this pull request? New Kafka consumer api for the released 0.10 version of Kafka ## How was this patch tested? Unit tests, manual tests Author: cody koeninger

[2/2] spark git commit: [SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 Consumer API

2016-06-30 Thread tdas
[SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 Consumer API ## What changes were proposed in this pull request? New Kafka consumer api for the released 0.10 version of Kafka ## How was this patch tested? Unit tests, manual tests Author: cody koeninger

[1/2] spark git commit: [SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 Consumer API

2016-06-30 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 a54852350 -> 3134f116a http://git-wip-us.apache.org/repos/asf/spark/blob/3134f116/external/kafka-0-10/src/test/java/org/apache/spark/streaming/kafka010/JavaConsumerStrategySuite.java

spark git commit: [SPARK-17096][SQL][STREAMING] Improve exception string reported through the StreamingQueryListener

2016-08-17 Thread tdas
ata Das <tathagata.das1...@gmail.com> Closes #14675 from tdas/SPARK-17096. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d60af8f6 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d60af8f6 Diff: http:

spark git commit: [SPARK-17096][SQL][STREAMING] Improve exception string reported through the StreamingQueryListener

2016-08-17 Thread tdas
hor: Tathagata Das <tathagata.das1...@gmail.com> Closes #14675 from tdas/SPARK-17096. (cherry picked from commit d60af8f6aa53373de1333cc642cf2a9d7b39d912) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger

2017-01-31 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 e43f161bb -> d35a1268d [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger In StructuredStreaming, if a new trigger was skipped because no new data arrived, we suddenly

spark git commit: [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger

2017-01-31 Thread tdas
Repository: spark Updated Branches: refs/heads/master 57d70d26c -> 081b7adda [SPARK-19378][SS] Ensure continuity of stateOperator and eventTime metrics even if there is no new data in trigger ## What changes were proposed in this pull request? In StructuredStreaming, if a new trigger was

spark git commit: [SPARK-16101][HOTFIX] Fix the build with Scala 2.10 by explicit typed argument

2017-01-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 60bd91a34 -> ec9493b44 [SPARK-16101][HOTFIX] Fix the build with Scala 2.10 by explicit typed argument ## What changes were proposed in this pull request? I goofed in https://github.com/apache/spark/pull/16669 which introduces the break

spark git commit: [SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD

2017-01-25 Thread tdas
gic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15396 from tdas/SPARK-14804. (cherry picked from commit 47d5d0ddb06c7d2c86515d9556c41dc80081f560) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project:

spark git commit: [SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD

2017-01-25 Thread tdas
gic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15396 from tdas/SPARK-14804. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47d5d0dd Tree: http://git-wip-us.a

spark git commit: [SPARK-14804][SPARK][GRAPHX] Fix checkpointing of VertexRDD/EdgeRDD

2017-01-25 Thread tdas
gic. ## How was this patch tested? New unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15396 from tdas/SPARK-14804. (cherry picked from commit 47d5d0ddb06c7d2c86515d9556c41dc80081f560) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project:

spark git commit: [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan

2017-01-20 Thread tdas
ion in complete mode. Currently it is incorrectly allowed when present anywhere in the plan. It gives unpredictable potentially incorrect results. ## How was this patch tested? New test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16662 from tdas/SPARK-19314. Project: h

spark git commit: [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan

2017-01-20 Thread tdas
ter a aggregation in complete mode. Currently it is incorrectly allowed when present anywhere in the plan. It gives unpredictable potentially incorrect results. ## How was this patch tested? New test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16662 from tdas/SPARK-19314. (cherry pi

spark git commit: [SPARK-19314][SS][CATALYST] Do not allow sort before aggregation in Structured Streaming plan

2017-01-20 Thread tdas
ter a aggregation in complete mode. Currently it is incorrectly allowed when present anywhere in the plan. It gives unpredictable potentially incorrect results. ## How was this patch tested? New test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16662 from tdas/SPARK-19314. (cherry pi

spark git commit: [SPARK-19267][SS] Fix a race condition when stopping StateStore

2017-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 4d286c903 -> 6f0ad575d [SPARK-19267][SS] Fix a race condition when stopping StateStore ## What changes were proposed in this pull request? There is a race condition when stopping StateStore which makes `StateStoreSuite.maintenance`

spark git commit: [SPARK-19267][SS] Fix a race condition when stopping StateStore

2017-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9b7a03f15 -> ea31f92bb [SPARK-19267][SS] Fix a race condition when stopping StateStore ## What changes were proposed in this pull request? There is a race condition when stopping StateStore which makes `StateStoreSuite.maintenance`

spark git commit: [SPARK-19497][SS] Implement streaming deduplication

2017-02-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7bf09433f -> 9bf4e2baa [SPARK-19497][SS] Implement streaming deduplication ## What changes were proposed in this pull request? This PR adds a special streaming deduplication operator to support `dropDuplicates` with `aggregation` and

spark git commit: [SPARK-19584][SS][DOCS] update structured streaming documentation around batch mode

2017-02-14 Thread tdas
tch query specification and options. zsxwing tdas Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Closes #16918 from tcondie/kafka-docs. (cherry picked from commit 447b2b5309251f3ae37857de73c157e59a0d76df) S

spark git commit: [SPARK-19584][SS][DOCS] update structured streaming documentation around batch mode

2017-02-14 Thread tdas
tch query specification and options. zsxwing tdas Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Closes #16918 from tcondie/kafka-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-17372][SQL][STREAMING] Avoid serialization issues by using Arrays to save file names in FileStreamSource

2016-09-06 Thread tdas
a.das1...@gmail.com> Closes #14987 from tdas/SPARK-17372. (cherry picked from commit eb1ab88a86ce35f3d6ba03b3a798099fbcf6b3fc) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-17372][SQL][STREAMING] Avoid serialization issues by using Arrays to save file names in FileStreamSource

2016-09-06 Thread tdas
a.das1...@gmail.com> Closes #14987 from tdas/SPARK-17372. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eb1ab88a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/eb1ab88a Diff: http://git-wip-us.apache.org/repos/asf/s

[1/2] spark git commit: [SPARK-17346][SQL] Add Kafka source for Structured Streaming

2016-10-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5fd54b994 -> 9293734d3 http://git-wip-us.apache.org/repos/asf/spark/blob/9293734d/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

spark git commit: [SPARK-17346][SQL][TEST-MAVEN] Generate the sql test jar to fix the maven build

2016-10-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9293734d3 -> b678e465a [SPARK-17346][SQL][TEST-MAVEN] Generate the sql test jar to fix the maven build ## What changes were proposed in this pull request? Generate the sql test jar to fix the maven build ## How was this patch tested?

[2/2] spark git commit: [SPARK-17346][SQL] Add Kafka source for Structured Streaming

2016-10-05 Thread tdas
/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit?usp=sharing tdas did most of work and part of them was inspired by koeninger's work. ### Introduction The Kafka source is a structured streaming data source to poll data from Kafka. The schema of reading data is as follows: Column | Type | key | binary

spark git commit: Revert "[SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)"

2016-09-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 bec077069 -> 0cfc0469b Revert "[SPARK-17480][SQL][FOLLOWUP] Fix more instances which calls List.length/size which is O(n)" This reverts commit a3bba372abce926351335d0a2936b70988f19b23. Project:

spark git commit: [SPARK-17643] Remove comparable requirement from Offset

2016-09-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master f62ddc598 -> 988c71457 [SPARK-17643] Remove comparable requirement from Offset For some sources, it is difficult to provide a global ordering based only on the data in the offset. Since we don't use comparison for correctness, lets

spark git commit: [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore

2016-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master b6de0c98c -> 6f7ecb0f2 [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore ## What changes were proposed in this pull request? If the rename operation in the state store fails (`fs.rename` returns `false`), the StateStore

spark git commit: [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore

2016-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 8aa419b23 -> 0cceb1bfe [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore ## What changes were proposed in this pull request? If the rename operation in the state store fails (`fs.rename` returns `false`), the

spark git commit: [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore

2016-11-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 ba80eaf72 -> 988f9080a [SPARK-18342] Make rename failures fatal in HDFSBackedStateStore ## What changes were proposed in this pull request? If the rename operation in the state store fails (`fs.rename` returns `false`), the

spark git commit: [SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite

2016-10-18 Thread tdas
sted? Ran existing unit test MANY TIME in Jenkins Author: Tathagata Das <tathagata.das1...@gmail.com> Author: Liwei Lin <lwl...@gmail.com> Closes #15519 from tdas/metrics-flaky-test-fix. (cherry picked from commit 7d878cf2da04800bc4147b05610170865b148c64) Signed-off-by: Tathagat

spark git commit: [SQL][STREAMING][TEST] Follow up to remove Option.contains for Scala 2.10 compatibility

2016-10-18 Thread tdas
ild. ## How was this patch tested? Locally compiled and ran sql/core unit tests in 2.10 Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15531 from tdas/metrics-flaky-test-fix-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

spark git commit: [SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite

2016-10-18 Thread tdas
existing unit test MANY TIME in Jenkins Author: Tathagata Das <tathagata.das1...@gmail.com> Author: Liwei Lin <lwl...@gmail.com> Closes #15519 from tdas/metrics-flaky-test-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/

spark git commit: [SQL][STREAMING][TEST] Follow up to remove Option.contains for Scala 2.10 compatibility

2016-10-18 Thread tdas
ild. ## How was this patch tested? Locally compiled and ran sql/core unit tests in 2.10 Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15531 from tdas/metrics-flaky-test-fix-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

[2/2] spark git commit: [SPARK-17731][SQL][STREAMING] Metrics for structured streaming for branch-2.0

2016-10-17 Thread tdas
](https://github.com/tdas/spark/blob/ee8e899e4c274c363a8b4d13e8bf57b0b467a50e/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L62)). - Different Mima exclusions -- ## What changes were proposed in this pull request? Metrics are needed for monitoring

[1/2] spark git commit: [SPARK-17731][SQL][STREAMING] Metrics for structured streaming for branch-2.0

2016-10-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 a0d9015b3 -> 881e0eb05 http://git-wip-us.apache.org/repos/asf/spark/blob/881e0eb0/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala

spark git commit: [SPARK-17711] Compress rolled executor log

2016-10-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 2aa25833c -> 26e978a93 [SPARK-17711] Compress rolled executor log ## What changes were proposed in this pull request? This PR adds support for executor log compression. ## How was this patch tested? Unit tests cc: yhuai tdas men

spark git commit: [SPARK-17711] Compress rolled executor log

2016-10-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 37686539f -> 231f39e3f [SPARK-17711] Compress rolled executor log ## What changes were proposed in this pull request? This PR adds support for executor log compression. ## How was this patch tested? Unit tests cc: yhuai tdas men

spark git commit: [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs for branch-2.0

2016-10-18 Thread tdas
eam*Event` - Changed the fields in `StreamingQueryListener.on***` from `query*` to `event` ## How was this patch tested? Existing unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15535 from tdas/SPARK-17731-1-branch-2.0. Project: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryListener APIs

2016-10-18 Thread tdas
ess/Terminated)` events to `Stream*Event` - Changed the fields in `StreamingQueryListener.on***` from `query*` to `event` ## How was this patch tested? Existing unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15530 from tdas/SPARK-17731-1. Project: http:

spark git commit: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master bd85603ba -> c07187823 [SPARK-18124] Observed delay based Event Time Watermarks This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_. An event time watermark allows the

spark git commit: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 ae66799fe -> 27999b366 [SPARK-18124] Observed delay based Event Time Watermarks This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_. An event time watermark allows the

spark git commit: [SPARK-18510][SQL] Follow up to address comments in #15951

2016-11-23 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 15d2cf264 -> 27d81d000 [SPARK-18510][SQL] Follow up to address comments in #15951 ## What changes were proposed in this pull request? This PR addressed the rest comments in #15951. ## How was this patch tested? Jenkins Author:

spark git commit: [SPARK-18510][SQL] Follow up to address comments in #15951

2016-11-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0d1bf2b6c -> 223fa218e [SPARK-18510][SQL] Follow up to address comments in #15951 ## What changes were proposed in this pull request? This PR addressed the rest comments in #15951. ## How was this patch tested? Jenkins Author: Shixiong

spark git commit: [SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work with Spark jobs

2016-11-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 fb2ea54a6 -> bd338f60d [SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work with Spark jobs ## What changes were proposed in this pull request? This PR adds `CachedKafkaConsumer.getAndIgnoreLostData` to handle corner

spark git commit: [SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work with Spark jobs

2016-11-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master bdc8153e8 -> 2fd101b2f [SPARK-18373][SPARK-18529][SS][KAFKA] Make failOnDataLoss=false work with Spark jobs ## What changes were proposed in this pull request? This PR adds `CachedKafkaConsumer.getAndIgnoreLostData` to handle corner

spark git commit: [SPARK-18339][SPARK-18513][SQL] Don't push down current_timestamp for filters in StructuredStreaming and persist batch and watermark timestamps to offset log.

2016-11-28 Thread tdas
he offset log so that these values are consistent across multiple executions of the same batch. brkyvz zsxwing tdas ## How was this patch tested? A test was added to StreamingAggregationSuite ensuring the above use case is handled. The test injects a stream of time values (in seconds) to a

spark git commit: [SPARK-18339][SPARK-18513][SQL] Don't push down current_timestamp for filters in StructuredStreaming and persist batch and watermark timestamps to offset log.

2016-11-28 Thread tdas
o the offset log so that these values are consistent across multiple executions of the same batch. brkyvz zsxwing tdas ## How was this patch tested? A test was added to StreamingAggregationSuite ensuring the above use case is handled. The test injects a stream of time values (in seconds) to a

spark git commit: [SPARK-18510] Fix data corruption from inferred partition column dataTypes

2016-11-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master f129ebcd3 -> 0d1bf2b6c [SPARK-18510] Fix data corruption from inferred partition column dataTypes ## What changes were proposed in this pull request? ### The Issue If I specify my schema when doing ```scala spark.read

spark git commit: [SPARK-18510] Fix data corruption from inferred partition column dataTypes

2016-11-23 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 835f03f34 -> 15d2cf264 [SPARK-18510] Fix data corruption from inferred partition column dataTypes ## What changes were proposed in this pull request? ### The Issue If I specify my schema when doing ```scala spark.read

spark git commit: [SPARK-18337] Complete mode memory sinks should be able to recover from checkpoints

2016-11-15 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 5f7a9af66 -> f13a33b47 [SPARK-18337] Complete mode memory sinks should be able to recover from checkpoints ## What changes were proposed in this pull request? It would be nice if memory sinks can also recover from checkpoints. For

spark git commit: [SPARK-18337] Complete mode memory sinks should be able to recover from checkpoints

2016-11-15 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 de545e7c8 -> e2452c632 [SPARK-18337] Complete mode memory sinks should be able to recover from checkpoints ## What changes were proposed in this pull request? It would be nice if memory sinks can also recover from checkpoints. For

spark git commit: [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog

2016-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 6717981e4 -> 136f687c6 [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog ## What changes were proposed in this pull request? HDFS `write` may just hang until timeout if some network error happens. It's better to enable

spark git commit: [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog

2016-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 da9d51661 -> 9dad3a7b0 [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog ## What changes were proposed in this pull request? HDFS `write` may just hang until timeout if some network error happens. It's better to enable

spark git commit: [SPARK-18497][SS] Make ForeachSink support watermark

2016-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 6f7ff7509 -> 2a40de408 [SPARK-18497][SS] Make ForeachSink support watermark ## What changes were proposed in this pull request? The issue in ForeachSink is the new created DataSet still uses the old QueryExecution. When

spark git commit: [SPARK-18497][SS] Make ForeachSink support watermark

2016-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 4b1df0e89 -> b4bad04c5 [SPARK-18497][SS] Make ForeachSink support watermark ## What changes were proposed in this pull request? The issue in ForeachSink is the new created DataSet still uses the old QueryExecution. When

spark git commit: [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog

2016-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 40d59ff5e -> e5f5c29e0 [SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog ## What changes were proposed in this pull request? HDFS `write` may just hang until timeout if some network error happens. It's better to enable

spark git commit: [SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid breaking history server

2016-10-31 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7c3786929 -> d2923f173 [SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid breaking history server ## What changes were proposed in this pull request? Because of the refactoring work in Structured Streaming, the event logs

spark git commit: [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite

2016-10-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 ff9f5bbf1 -> a6b5e1dcc [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite ## What changes were proposed in this pull request? A follow up Pr for SPARK-17346 to fix flaky

spark git commit: [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite

2016-10-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master c8c090640 -> 75b9e3514 [SPARK-17346][SQL][TESTS] Fix the flaky topic deletion in KafkaSourceStressSuite ## What changes were proposed in this pull request? A follow up Pr for SPARK-17346 to fix flaky

[1/2] spark git commit: [SPARK-17731][SQL][STREAMING] Metrics for structured streaming

2016-10-13 Thread tdas
Repository: spark Updated Branches: refs/heads/master 08eac3560 -> 7106866c2 http://git-wip-us.apache.org/repos/asf/spark/blob/7106866c/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala

[2/2] spark git commit: [SPARK-17731][SQL][STREAMING] Metrics for structured streaming

2016-10-13 Thread tdas
for making sure input rows are counted are is source-specific test suites. - Additional tests to test minor additions in LocalTableScanExec, StateStore, etc. Metrics also manually tested using Ganglia sink Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15307 from tdas/SPARK

spark git commit: [TEST] Ignore flaky test in StreamingQueryListenerSuite

2016-10-14 Thread tdas
est-sbt-hadoop-2.7/1736/testReport/junit/org.apache.spark.sql.streaming/StreamingQueryListenerSuite/single_listener__check_trigger_statuses/ Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #15491 from tdas/metrics-flaky-test. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 76e1f1651 -> e9b3afac9 [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite ## What changes were proposed in this pull request? Fixed the following failures: ```

spark git commit: [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master bb94f61a7 -> edc87e189 [SPARK-18588][TESTS] Fix flaky test: KafkaSourceStressForDontFailOnDataLossSuite ## What changes were proposed in this pull request? Fixed the following failures: ```

spark git commit: [SPARK-18796][SS] StreamingQueryManager should not block when starting a query

2016-12-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master bc59951ba -> 417e45c58 [SPARK-18796][SS] StreamingQueryManager should not block when starting a query ## What changes were proposed in this pull request? Major change in this PR: - Add `pendingQueryNames` and `pendingQueryIds` to track

spark git commit: [SPARK-18796][SS] StreamingQueryManager should not block when starting a query

2016-12-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 1aeb7f427 -> 9dc5fa5f7 [SPARK-18796][SS] StreamingQueryManager should not block when starting a query ## What changes were proposed in this pull request? Major change in this PR: - Add `pendingQueryNames` and `pendingQueryIds` to

spark git commit: [SPARK-18834][SS] Expose event time stats through StreamingQueryProgress

2016-12-13 Thread tdas
QueryProgress.triggerTimestamp` to differentiate from `queryTimestamps`. It has the timestamp of when the trigger was started. ## How was this patch tested? Updated tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16258 from tdas/SPARK-18834. (cherry picked from commit c68fb42

spark git commit: [SPARK-18834][SS] Expose event time stats through StreamingQueryProgress

2016-12-13 Thread tdas
QueryProgress.triggerTimestamp` to differentiate from `queryTimestamps`. It has the timestamp of when the trigger was started. ## How was this patch tested? Updated tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16258 from tdas/SPARK-18834. Project: http://git-wip-us.apache.or

spark git commit: [SPARK-18870] Disallowed Distinct Aggregations on Streaming Datasets

2016-12-15 Thread tdas
ons with isDistinct = true. ## How was this patch tested? Added unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16289 from tdas/SPARK-18870. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4f7292c8 T

spark git commit: [SPARK-18870] Disallowed Distinct Aggregations on Streaming Datasets

2016-12-15 Thread tdas
ons with isDistinct = true. ## How was this patch tested? Added unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16289 from tdas/SPARK-18870. (cherry picked from commit 4f7292c87512a7da3542998d0e5aa21c27a511e9) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.co

spark git commit: [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4f7292c87 -> 68a6dc974 [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource ## What changes were proposed in this pull request? When starting a stream with a lot of backfill and maxFilesPerTrigger, the user could often want to

spark git commit: [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 e430915fa -> 900ce558a [SPARK-18826][SS] Add 'latestFirst' option to FileStreamSource ## What changes were proposed in this pull request? When starting a stream with a lot of backfill and maxFilesPerTrigger, the user could often want

spark git commit: [SPARK-18888] partitionBy in DataStreamWriter in Python throws _to_seq not defined

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 900ce558a -> b6a81f472 [SPARK-1] partitionBy in DataStreamWriter in Python throws _to_seq not defined ## What changes were proposed in this pull request? `_to_seq` wasn't imported. ## How was this patch tested? Added

spark git commit: [SPARK-18888] partitionBy in DataStreamWriter in Python throws _to_seq not defined

2016-12-15 Thread tdas
Repository: spark Updated Branches: refs/heads/master 68a6dc974 -> 0917c8ee0 [SPARK-1] partitionBy in DataStreamWriter in Python throws _to_seq not defined ## What changes were proposed in this pull request? `_to_seq` wasn't imported. ## How was this patch tested? Added partitionBy to

spark git commit: [SPARK-18850][SS] Make StreamExecution and progress classes serializable

2016-12-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master 78062b852 -> d7f3058e1 [SPARK-18850][SS] Make StreamExecution and progress classes serializable ## What changes were proposed in this pull request? This PR adds StreamingQueryWrapper to make StreamExecution and progress classes

spark git commit: [SPARK-18850][SS] Make StreamExecution and progress classes serializable

2016-12-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 d8548c8a7 -> a73201daf [SPARK-18850][SS] Make StreamExecution and progress classes serializable ## What changes were proposed in this pull request? This PR adds StreamingQueryWrapper to make StreamExecution and progress classes

spark git commit: [SPARK-18671][SS][TEST-MAVEN] Follow up PR to fix test for Maven

2016-12-06 Thread tdas
a-0-10-sql tests. So moved the kafka-source-offset-version-2.1.0 from sql test resources to kafka-0-10-sql test resources. ## How was this patch tested? Manually ran maven test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16183 from tdas/SPARK-18671-1. (cherry picked fr

spark git commit: [SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery should be sent only to the listeners in the same session as the query

2016-12-07 Thread tdas
ted? Updated test harness code to use the correct session, and added new unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16186 from tdas/SPARK-18758. (cherry picked from commit 9ab725eabbb4ad515a663b395bd2f91bb5853a23) Signed-off-by: Tathagata Das <tathagata.das1.

spark git commit: [SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery should be sent only to the listeners in the same session as the query

2016-12-07 Thread tdas
ted? Updated test harness code to use the correct session, and added new unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16186 from tdas/SPARK-18758. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9ab7

spark git commit: [SPARK-18776][SS] Make Offset for FileStreamSource corrected formatted in json

2016-12-08 Thread tdas
ted? Updated unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16205 from tdas/SPARK-18776. (cherry picked from commit 458fa3325e5f8c21c50e406ac8059d6236f93a9c) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-18776][SS] Make Offset for FileStreamSource corrected formatted in json

2016-12-08 Thread tdas
ted unit test. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16205 from tdas/SPARK-18776. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/458fa332 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-18754][SS] Rename recentProgresses to recentProgress

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master edc87e189 -> 70b2bf717 [SPARK-18754][SS] Rename recentProgresses to recentProgress Based on an informal survey, users find this option easier to understand / remember. Author: Michael Armbrust Closes #16182 from

spark git commit: [SPARK-18754][SS] Rename recentProgresses to recentProgress

2016-12-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 e9b3afac9 -> 1c6419718 [SPARK-18754][SS] Rename recentProgresses to recentProgress Based on an informal survey, users find this option easier to understand / remember. Author: Michael Armbrust Closes #16182

spark git commit: [SPARK-18516][STRUCTURED STREAMING] Follow up PR to add StreamingQuery.status to Python

2016-11-29 Thread tdas
oid unnecessarily exposing implicit object StreamingQueryStatus, consistent with StreamingQueryProgress) - Add StreamingQuery.status to Python - Fix post-termination status ## How was this patch tested? New unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16075 from tdas/SPAR

spark git commit: [SPARK-18516][STRUCTURED STREAMING] Follow up PR to add StreamingQuery.status to Python

2016-11-29 Thread tdas
oid unnecessarily exposing implicit object StreamingQueryStatus, consistent with StreamingQueryProgress) - Add StreamingQuery.status to Python - Fix post-termination status ## How was this patch tested? New unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16075 from t

spark git commit: [SPARK-18694][SS] Add StreamingQuery.explain and exception to Python and fix StreamingQueryException

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 410b78986 -> 246012859 [SPARK-18694][SS] Add StreamingQuery.explain and exception to Python and fix StreamingQueryException ## What changes were proposed in this pull request? - Add StreamingQuery.explain and exception to Python. - Fix

spark git commit: [SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across restart and not auto-generate StreamingQuery.name

2016-12-05 Thread tdas
tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16113 from tdas/SPARK-18657. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bb57bfe9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bb57bfe9 D

spark git commit: [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 fecd23d2c -> 6c4c33684 [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink ## What changes were proposed in this pull request? Move DataFrame.collect out of synchronized block so that we can query content

spark git commit: [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3ba69b648 -> 1b2785c3d [SPARK-18729][SS] Move DataFrame.collect out of synchronized block in MemorySink ## What changes were proposed in this pull request? Move DataFrame.collect out of synchronized block so that we can query content in

spark git commit: [SPARK-18657][SPARK-18668] Make StreamingQuery.id persists across restart and not auto-generate StreamingQuery.name

2016-12-05 Thread tdas
Test handling of name=null in json generation of StreamingQueryProgress - [x] Test handling of name=null in json generation of StreamingQueryListener events - [x] Test python API of runId Updated unit tests and new unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16113 fr

spark git commit: [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 1946854ab -> d4588165e [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter ## What changes were proposed in this pull request? Move no data rate limit from StreamExecution to ProgressReporter to make

spark git commit: [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 508de38c9 -> 4af142f55 [SPARK-18722][SS] Move no data rate limit from StreamExecution to ProgressReporter ## What changes were proposed in this pull request? Move no data rate limit from StreamExecution to ProgressReporter to make

spark git commit: [SPARK-18721][SS] Fix ForeachSink with watermark + append

2016-12-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master b8c7b8d31 -> 7863c6237 [SPARK-18721][SS] Fix ForeachSink with watermark + append ## What changes were proposed in this pull request? Right now ForeachSink creates a new physical plan, so StreamExecution cannot retrieval metrics and

spark git commit: [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data

2016-12-02 Thread tdas
Repository: spark Updated Branches: refs/heads/master a985dd8e9 -> 56a503df5 [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data ## What changes were proposed in this pull request? This PR adds a sql conf

spark git commit: [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data

2016-12-02 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 f915f8128 -> f53763275 [SPARK-18670][SS] Limit the number of StreamingQueryListener.StreamProgressEvent when there is no data ## What changes were proposed in this pull request? This PR adds a sql conf

spark git commit: [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options

2017-01-06 Thread tdas
ntent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png) ![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png) Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16468 from tdas/SPARK-19074. (cherry pic

spark git commit: [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options

2017-01-06 Thread tdas
om/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png) ![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png) Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16468 from tdas/SPARK-19074. Project: http://git-wi

spark git commit: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance

2016-12-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 3857d5ba8 -> 063a98e52 [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance ## What changes were proposed in this pull request? It was pretty flaky before 10 days ago.

spark git commit: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance

2016-12-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 047a9d92c -> b2dd8ec6b [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance ## What changes were proposed in this pull request? It was pretty flaky before 10 days ago.

spark git commit: [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 607a1e63d -> ccfe60a83 [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality ## What changes were proposed in this pull request? The failure is because in `test("basic functionality")`, it doesn't block

<    2   3   4   5   6   7   8   9   >