spark git commit: [SPARK-22733] Split StreamExecution into MicroBatchExecution and StreamExecution.

2017-12-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 2fe16333d -> 59daf91b7 [SPARK-22733] Split StreamExecution into MicroBatchExecution and StreamExecution. ## What changes were proposed in this pull request? StreamExecution is now an abstract base class, which MicroBatchExecution (the

[1/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2

2017-12-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1e44dd004 -> f8c7c1f21 http://git-wip-us.apache.org/repos/asf/spark/blob/f8c7c1f2/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala -- diff

[2/2] spark git commit: [SPARK-22732] Add Structured Streaming APIs to DataSourceV2

2017-12-13 Thread zsxwing
[SPARK-22732] Add Structured Streaming APIs to DataSourceV2 ## What changes were proposed in this pull request? This PR provides DataSourceV2 API support for structured streaming, including new pieces needed to support continuous processing [SPARK-20928]. High level summary: - DataSourceV2

spark git commit: [SPARK-22781][SS] Support creating streaming dataset with ORC files

2017-12-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 13268a58f -> 9962390af [SPARK-22781][SS] Support creating streaming dataset with ORC files ## What changes were proposed in this pull request? Like `Parquet`, users can use `ORC` with Apache Spark structured streaming. This PR adds

spark git commit: [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 0568f289d -> eb49c3244 [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery ## What changes were proposed in this pull request? the previous [PR](https://github.com/apache/spark/pull/19469) is

spark git commit: [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master b70aa9e08 -> 5ebdcd185 [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint ## What changes were proposed in this pull request? It seems that recovering from a checkpoint can replace the old driver and executor

spark git commit: [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 eb49c3244 -> 371be22b1 [SPARK-22294][DEPLOY] Reset spark.driver.bindAddress when starting a Checkpoint ## What changes were proposed in this pull request? It seems that recovering from a checkpoint can replace the old driver and

spark git commit: [SPARK-19644][SQL] Clean up Scala reflection garbage after creating Encoder

2017-11-10 Thread zsxwing
;zsxw...@gmail.com> Closes #19687 from zsxwing/SPARK-19644. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24ea781c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24ea781c Diff: http://git-wip-us.apache.org

spark git commit: [SPARK-19644][SQL] Clean up Scala reflection garbage after creating Encoder (branch-2.2)

2017-11-10 Thread zsxwing
nce is `cleanUpReflectionObjects` is protected by `ScalaReflectionLock.synchronized` in this PR for Scala 2.10. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19718 from zsxwing/SPARK-19644-2.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-21667][STREAMING] ConsoleSink should not fail streaming query with checkpointLocation option

2017-11-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f2da738c7 -> 808e886b9 [SPARK-21667][STREAMING] ConsoleSink should not fail streaming query with checkpointLocation option ## What changes were proposed in this pull request? Fix to allow recovery on console , avoid checkpoint exception

spark git commit: [SPARK-22535][PYSPARK] Sleep before killing the python worker in PythRunner.MonitorThread (branch-2.2)

2017-11-16 Thread zsxwing
ted? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19768 from zsxwing/SPARK-22535-2.2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be68f86e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be68f86e D

spark git commit: [SPARK-22544][SS] FileStreamSource should use its own hadoop conf to call globPathIfNecessary

2017-11-17 Thread zsxwing
onf into `globPathIfNecessary` so that it can pick up user's hadoop configurations, such as credentials. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19771 from zsxwing/fix-file-stream-conf. (cherry picked from commit bf0c0ae2dcc7fd1ce92cd0fb4809bb3

spark git commit: [SPARK-22544][SS] FileStreamSource should use its own hadoop conf to call globPathIfNecessary

2017-11-17 Thread zsxwing
onf into `globPathIfNecessary` so that it can pick up user's hadoop configurations, such as credentials. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19771 from zsxwing/fix-file-stream-conf. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery

2017-11-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e3f67a97f -> 882079f5c [SPARK-22243][DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery ## What changes were proposed in this pull request? the previous [PR](https://github.com/apache/spark/pull/19469) is deleted

spark git commit: [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example

2017-11-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 0e97c8eef -> ede0e1a98 [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example ## What changes were proposed in this pull request? When run in YARN cluster mode, the StructuredKafkaWordCount

spark git commit: [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example

2017-11-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9eb7096c4 -> 11c402104 [SPARK-22403][SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example ## What changes were proposed in this pull request? When run in YARN cluster mode, the StructuredKafkaWordCount example

spark git commit: [SPARK-22187][SS][REVERT] Revert change in state row format for mapGroupsWithState

2017-12-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0ba8f4b21 -> b11869bc3 [SPARK-22187][SS][REVERT] Revert change in state row format for mapGroupsWithState ## What changes were proposed in this pull request? #19416 changed the format in which rows were encoded in the state store.

spark git commit: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap non-recursively

2017-10-31 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7986cc09b -> 73231860b [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap non-recursively ## What changes were proposed in this pull request? Write HDFSBackedStateStoreProvider.loadMap non-recursively. This prevents stack overflow

spark git commit: [SPARK-24214][SS] Fix toJSON for StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation

2018-05-09 Thread zsxwing
t; to provide the SparkSession parameter otherwise TreeNode.toJSON cannot get the full constructor parameter list. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #21275 from zsxwing/SPARK-24214. (cherry

spark git commit: [SPARK-24214][SS] Fix toJSON for StreamingRelationV2/StreamingExecutionRelation/ContinuousExecutionRelation

2018-05-09 Thread zsxwing
t; to provide the SparkSession parameter otherwise TreeNode.toJSON cannot get the full constructor parameter list. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #21275 from zsxwing/SPARK-24214. Project: http://git-wip-us.apache.org/repos/as

spark git commit: [SPARK-23565][SS] New error message for structured streaming sources assertion

2018-04-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 109935fc5 -> 2824f12b8 [SPARK-23565][SS] New error message for structured streaming sources assertion ## What changes were proposed in this pull request? A more informative message to tell you why a structured streaming query cannot

spark git commit: [SPARK-24040][SS] Support single partition aggregates in continuous processing.

2018-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d610d2a3f -> 3fabbc576 [SPARK-24040][SS] Support single partition aggregates in continuous processing. ## What changes were proposed in this pull request? Support aggregates with exactly 1 partition in continuous processing. A few small

spark git commit: [SPARK-24332][SS][MESOS] Fix places reading 'spark.network.timeout' as milliseconds

2018-05-24 Thread zsxwing
sue that reading "spark.network.timeout" using a wrong time unit when the user doesn't specify a time out. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #21382 from zsxwing/fix-network-timeout-conf. Project: http://git-wip-us.apache.org/r

spark git commit: [SPARK-20538][SQL] Wrap Dataset.reduce with withNewRddExecutionId.

2018-05-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0cf59fcbe -> 7696b9de0 [SPARK-20538][SQL] Wrap Dataset.reduce with withNewRddExecutionId. ## What changes were proposed in this pull request? Wrap Dataset.reduce with `withNewExecutionId`. Author: Soham Aurangabadkar

spark git commit: [SPARK-24159][SS] Enable no-data micro batches for streaming mapGroupswithState

2018-05-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7696b9de0 -> 807ba44cb [SPARK-24159][SS] Enable no-data micro batches for streaming mapGroupswithState ## What changes were proposed in this pull request? Enabled no-data batches in flatMapGroupsWithState in following two cases. - When

spark git commit: [SPARK-24235][SS] Implement continuous shuffle writer for single reader partition.

2018-06-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 299d297e2 -> 1b46f41c5 [SPARK-24235][SS] Implement continuous shuffle writer for single reader partition. ## What changes were proposed in this pull request?

spark git commit: [SPARK-24565][SS] Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame

2018-06-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 13092d733 -> 2cb976355 [SPARK-24565][SS] Add API for in Structured Streaming for exposing output rows of each microbatch as a DataFrame ## What changes were proposed in this pull request? Currently, the micro-batches in the

spark git commit: [SPARK-24351][SS] offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode

2018-06-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 98909c398 -> 6039b1323 [SPARK-24351][SS] offsetLog/commitLog purge thresholdBatchId should be computed with current committed epoch but not currentBatchId in CP mode ## What changes were proposed in this pull request? Compute the

spark git commit: [SPARK-24566][CORE] Fix spark.storage.blockManagerSlaveTimeoutMs default config

2018-06-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f6e6899a8 -> f71e8da5e [SPARK-24566][CORE] Fix spark.storage.blockManagerSlaveTimeoutMs default config This PR use spark.network.timeout in place of spark.storage.blockManagerSlaveTimeoutMs when it is not configured, as configuration doc

spark git commit: [SPARK-24578][CORE] Cap sub-region's size of returned nio buffer

2018-06-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 d687d97b1 -> 8928de3cd [SPARK-24578][CORE] Cap sub-region's size of returned nio buffer ## What changes were proposed in this pull request? This PR tries to fix the performance regression introduced by SPARK-21517. In our production

spark git commit: [SPARK-24578][CORE] Cap sub-region's size of returned nio buffer

2018-06-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c5a0d1132 -> 3f4bda728 [SPARK-24578][CORE] Cap sub-region's size of returned nio buffer ## What changes were proposed in this pull request? This PR tries to fix the performance regression introduced by SPARK-21517. In our production job,

spark git commit: [SPARK-24061][SS] Add TypedFilter support for continuous processing

2018-05-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master b857fb549 -> 7bbec0dce [SPARK-24061][SS] Add TypedFilter support for continuous processing ## What changes were proposed in this pull request? Add TypedFilter support for continuous processing application. ## How was this patch tested?

spark git commit: [SPARK-22366] Support ignoring missing files

2017-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 5415963d2 -> 8e9863531 [SPARK-22366] Support ignoring missing files ## What changes were proposed in this pull request? Add a flag "spark.sql.files.ignoreMissingFiles" to parallel the existing flag "spark.sql.files.ignoreCorruptFiles".

spark git commit: [SPARK-21475][Core]Revert "[SPARK-21475][CORE] Use NIO's Files API to replace FileInputStream/FileOutputStream in some critical paths"

2017-12-29 Thread zsxwing
the default `InputStream.skip` which just consumes and discards data. This causes a huge performance regression when reading shuffle files. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20119 from zsxwing/revert-SPARK-21475. Project: http://git-wip-

spark git commit: [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service

2018-01-04 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 bcfeef5a9 -> cd92913f3 [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't

spark git commit: [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service

2018-01-04 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6f68316e9 -> 93f92c0ed [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't

[1/2] spark git commit: [SPARK-22789] Map-only continuous processing execution

2017-12-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d23dc5b8e -> 8941a4abc http://git-wip-us.apache.org/repos/asf/spark/blob/8941a4ab/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala --

[2/2] spark git commit: [SPARK-22789] Map-only continuous processing execution

2017-12-22 Thread zsxwing
[SPARK-22789] Map-only continuous processing execution ## What changes were proposed in this pull request? Basic continuous execution, supporting map/flatMap/filter, with commits and advancement through RPC. ## How was this patch tested? new unit-ish tests (exercising execution end to end)

spark git commit: [SPARK-22956][SS] Bug fix for 2 streams union failover scenario

2018-01-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c7572b79d -> 07ae39d0e [SPARK-22956][SS] Bug fix for 2 streams union failover scenario ## What changes were proposed in this pull request? This problem reported by yanlin-Lynn ivoson and LiangchangZ. Thanks! When we union 2 streams from

spark git commit: [SPARK-22956][SS] Bug fix for 2 streams union failover scenario

2018-01-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 e2ffb9781 -> e58c4a929 [SPARK-22956][SS] Bug fix for 2 streams union failover scenario ## What changes were proposed in this pull request? This problem reported by yanlin-Lynn ivoson and LiangchangZ. Thanks! When we union 2 streams

spark git commit: [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution

2018-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 30272c668 -> 500c94434 [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution ## What changes were proposed in this pull request? Currently,

spark git commit: [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution

2018-01-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0e178e152 -> bc9641d90 [SPARK-23198][SS][TEST] Fix KafkaContinuousSourceStressForDontFailOnDataLossSuite to test ContinuousExecution ## What changes were proposed in this pull request? Currently,

spark git commit: Fix merge between 07ae39d0ec and 1667057851

2018-01-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 50345a2aa -> a963980a6 Fix merge between 07ae39d0ec and 1667057851 ## What changes were proposed in this pull request? The first commit added a new test, and the second refactored the class the test was in. The automatic merge put the

spark git commit: [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query.

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 86a845031 -> e946c63dd [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query. ## What changes were proposed in this pull request? Keep the run ID static, using a different ID for the epoch coordinator to

spark git commit: [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query.

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 dbd2a5566 -> 79ccd0cad [SPARK-23093][SS] Don't change run id when reconfiguring a continuous processing query. ## What changes were proposed in this pull request? Keep the run ID static, using a different ID for the epoch coordinator

spark git commit: [SPARK-23119][SS] Minor fixes to V2 streaming APIs

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7823d43ec -> bac0d661a [SPARK-23119][SS] Minor fixes to V2 streaming APIs ## What changes were proposed in this pull request? - Added `InterfaceStability.Evolving` annotations - Improved docs. ## How was this patch tested? Existing

spark git commit: [SPARK-23119][SS] Minor fixes to V2 streaming APIs

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 b84c2a306 -> 9783aea2c [SPARK-23119][SS] Minor fixes to V2 streaming APIs ## What changes were proposed in this pull request? - Added `InterfaceStability.Evolving` annotations - Improved docs. ## How was this patch tested? Existing

spark git commit: [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bac0d661a -> 1002bd6b2 [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins ## What changes were proposed in this pull request? Added documentation for stream-stream joins

spark git commit: [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 9783aea2c -> 050c1e24e [SPARK-23064][DOCS][SS] Added documentation for stream-stream joins ## What changes were proposed in this pull request? Added documentation for stream-stream joins

spark git commit: [SPARK-21996][SQL] read files with space in name for streaming

2018-01-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1002bd6b2 -> 021947020 [SPARK-21996][SQL] read files with space in name for streaming ## What changes were proposed in this pull request? Structured streaming is now able to read files with space in file name (previously it would skip

spark git commit: [SPARK-23242][SS][TESTS] Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread zsxwing
run. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20412 from zsxwing/SPARK-23242. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/07374498 Tree: http://git-wip-us.apache.org/

spark git commit: [SPARK-23242][SS][TESTS] Don't run tests in KafkaSourceSuiteBase twice

2018-01-26 Thread zsxwing
lso run. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20412 from zsxwing/SPARK-23242. (cherry picked from commit 073744985f439ca90afb9bd0bbc1332c53f7b4bb) Signed-off-by: Shixiong Zhu <zsxw...@gmail.com> Project: http://git-wip-us.apach

spark git commit: [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported

2018-01-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7bd14cfd4 -> 54277398a [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported ## What changes were proposed in this pull request? `MetricsReporter ` assumes that there has been some progress for

spark git commit: [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported

2018-01-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 20eea20c7 -> 105ae8680 [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported ## What changes were proposed in this pull request? `MetricsReporter ` assumes that there has been some progress

spark git commit: [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported

2018-01-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 db27a9365 -> 02176f4c2 [SPARK-22975][SS] MetricsReporter should not throw exception when there was no progress reported ## What changes were proposed in this pull request? `MetricsReporter ` assumes that there has been some progress

spark git commit: [SPARK-23245][SS][TESTS] Don't access `lastExecution.executedPlan` in StreamTest

2018-01-26 Thread zsxwing
hor: Jose Torres <j...@databricks.com> Closes #20413 from zsxwing/SPARK-23245. (cherry picked from commit 6328868e524121bd00595959d6d059f74e038a6b) Signed-off-by: Shixiong Zhu <zsxw...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apach

spark git commit: [SPARK-23245][SS][TESTS] Don't access `lastExecution.executedPlan` in StreamTest

2018-01-26 Thread zsxwing
ose Torres <j...@databricks.com> Closes #20413 from zsxwing/SPARK-23245. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6328868e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6328868e Diff: http://git-wip-us.a

spark git commit: [SPARK-23400][SQL] Add a constructors for ScalaUDF

2018-02-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d58fe2883 -> 2ee76c22b [SPARK-23400][SQL] Add a constructors for ScalaUDF ## What changes were proposed in this pull request? In this upcoming 2.3 release, we changed the interface of `ScalaUDF`. Unfortunately, some Spark packages (e.g.,

spark git commit: [SPARK-23400][SQL] Add a constructors for ScalaUDF

2018-02-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 320ffb130 -> 4f6a457d4 [SPARK-23400][SQL] Add a constructors for ScalaUDF ## What changes were proposed in this pull request? In this upcoming 2.3 release, we changed the interface of `ScalaUDF`. Unfortunately, some Spark packages

spark git commit: [SPARK-23434][SQL] Spark should not warn `metadata directory` for a HDFS file path

2018-02-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 83c008762 -> 3e48f3b9e [SPARK-23434][SQL] Spark should not warn `metadata directory` for a HDFS file path ## What changes were proposed in this pull request? In a kerberized cluster, when Spark reads a file path (e.g. `people.json`), it

spark git commit: [SPARK-23481][WEBUI] lastStageAttempt should fail when a stage doesn't exist

2018-02-21 Thread zsxwing
ext available stage in the store when a stage doesn't exist. This PR adds `last(stageId)` to ensure it returns a correct `StageData` ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20654 from zsxwing/SPARK-23481. (cherry picked fr

spark git commit: [SPARK-23481][WEBUI] lastStageAttempt should fail when a stage doesn't exist

2018-02-21 Thread zsxwing
ble stage in the store when a stage doesn't exist. This PR adds `last(stageId)` to ensure it returns a correct `StageData` ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #20654 from zsxwing/SPARK-23481. Project: http://git-wip-us.apache.

spark git commit: [SPARK-22824] Restore old offset for binary compatibility

2017-12-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7570eab6b -> 7798c9e6e [SPARK-22824] Restore old offset for binary compatibility ## What changes were proposed in this pull request? Some users depend on source compatibility with the org.apache.spark.sql.execution.streaming.Offset

spark git commit: [SPARK-24896][SQL] Uuid should produce different values for each execution in streaming query

2018-08-02 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master efef55388 -> d0bc3ed67 [SPARK-24896][SQL] Uuid should produce different values for each execution in streaming query ## What changes were proposed in this pull request? `Uuid`'s results depend on random seed given during analysis. Thus

spark git commit: [SPARK-18057][SS] Update Kafka client version from 0.10.0.1 to 2.0.0

2018-07-31 Thread zsxwing
wip-us.apache.org/repos/asf/spark/diff/e82784d1 Branch: refs/heads/master Commit: e82784d13fac7d45164dfadb00d3fa43e64e0bde Parents: 1223a20 Author: tedyu Authored: Tue Jul 31 13:14:14 2018 -0700 Committer: zsxwing Committed: Tue Jul 31 13:14:14 2018 -0

spark git commit: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version from 0.10.0.1 to 2.0.0

2018-08-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 8c14276c3 -> 4c27663cb [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version from 0.10.0.1 to 2.0.0 ## What changes were proposed in this pull request? Increase ZK timeout and harmonize configs across Kafka tests to resol…ve

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page

2018-08-10 Thread zsxwing
Array` to fix the issue. ## How was this patch tested? The new unit test will make JVM crash without the fix. Closes #22062 from zsxwing/SPARK-25081. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apa

spark git commit: [SPARK-25081][CORE] Nested spill in ShuffleExternalSorter should not access released memory page

2018-08-10 Thread zsxwing
allocateArray` to fix the issue. ## How was this patch tested? The new unit test will make JVM crash without the fix. Closes #22062 from zsxwing/SPARK-25081. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu (cherry picked from commit f5aba657396bd4e2e03dd06491a2d169a99592a7) Signed-off-by:

spark git commit: [SPARK-24161][SS] Enable debug package feature on structured streaming

2018-08-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3c96937c7 -> 87ca7396c [SPARK-24161][SS] Enable debug package feature on structured streaming ## What changes were proposed in this pull request? Currently, debug package has a implicit class "DebugQuery" which matches Dataset to provide

spark git commit: [SPARK-18057][FOLLOW-UP] Use 127.0.0.1 to avoid zookeeper picking up an ipv6 address

2018-08-14 Thread zsxwing
ost` to make sure zookeeper will never use an ipv6 address. ## How was this patch tested? Jenkins Closes #22097 from zsxwing/fix-zookeeper-connect. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-25214][SS] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false`

2018-08-24 Thread zsxwing
may return duplicated records when `failOnDataLoss=false` because it doesn't skip missing offsets. This PR fixes the issue and also adds regression tests for all Kafka readers. ## How was this patch tested? New tests. Closes #22207 from zsxwing/SPARK-25214. Authored-by: Shixiong Zhu Signed-

spark git commit: [SPARK-25214][SS][FOLLOWUP] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false`

2018-08-25 Thread zsxwing
fix a potential flaky test. `processAllAvailable` doesn't work for continuous processing so we should not use it for a continuous query. ## How was this patch tested? Jenkins. Closes #22230 from zsxwing/SPARK-25214-2. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://

spark git commit: [SPARK-25005][SS] Support non-consecutive offsets for Kafka

2018-08-28 Thread zsxwing
tch They are all covered by the new unit tests. ## How was this patch tested? The new unit tests. Closes #22042 from zsxwing/kafka-transaction-read. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.

spark git commit: [SPARK-25218][CORE] Fix potential resource leaks in TransportServer and SocketAuthHelper

2018-08-28 Thread zsxwing
ces for all types of errors. ## How was this patch tested? Jenkins Closes #22210 from zsxwing/SPARK-25218. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/592e3a42 Tree: h

spark git commit: [SPARK-25116][TESTS] Fix the Kafka cluster leak and clean up cached producers

2018-08-17 Thread zsxwing
ion to node 0 could not be established. Broker may not be available. ``` I also reverted https://github.com/apache/spark/pull/22097/commits/b5eb54244ed573c8046f5abf7bf087f5f08dba58 introduced by #22097 since it doesn't help. ## How was this patch tested? Jenkins Closes #22106 from zsxwing/SP

spark git commit: [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kafka tests.

2018-08-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 381a967a7 -> 810d59ce4 [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kafka tests. ## What changes were proposed in this pull request? Fix flaky synchronization in Kafka tests - we need to use the scan config that was persisted

spark git commit: [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection.ExternalAppendOnlyMapSuiteCheck

2018-08-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 310632498 -> 49a1993b1 [SPARK-25163][SQL] Fix flaky test: o.a.s.util.collection.ExternalAppendOnlyMapSuiteCheck ## What changes were proposed in this pull request? `ExternalAppendOnlyMapSuiteCheck` test is flaky. We use a

spark git commit: [SPARK-25181][CORE] Limit Thread Pool size in BlockManager Master and Slave endpoints

2018-08-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 2381953ab -> 68ec4d641 [SPARK-25181][CORE] Limit Thread Pool size in BlockManager Master and Slave endpoints ## What changes were proposed in this pull request? Limit Thread Pool size in BlockManager Master and Slave endpoints.

spark git commit: [SPARK-25288][TESTS] Fix flaky Kafka transaction tests

2018-08-31 Thread zsxwing
can see a specified offset before checking the result. ## How was this patch tested? Jenkins Closes #22293 from zsxwing/SPARK-25288. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-23533][SS] Add support for changing ContinuousDataReader's startOffset

2018-03-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4f5bad615 -> 7c3e8995f [SPARK-23533][SS] Add support for changing ContinuousDataReader's startOffset ## What changes were proposed in this pull request? As discussion in #20675, we need add a new interface `ContinuousDataReaderFactory`

spark git commit: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 85ab72b59 -> 6b5f9c374 [SPARK-23788][SS] Fix race in StreamingQuerySuite ## What changes were proposed in this pull request? The serializability test uses the same MemoryStream instance for 3 different queries. If any of those

spark git commit: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 ea44783ad -> 523fcafc5 [SPARK-23788][SS] Fix race in StreamingQuerySuite ## What changes were proposed in this pull request? The serializability test uses the same MemoryStream instance for 3 different queries. If any of those

spark git commit: [SPARK-23788][SS] Fix race in StreamingQuerySuite

2018-03-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master a33655348 -> 816a5496b [SPARK-23788][SS] Fix race in StreamingQuerySuite ## What changes were proposed in this pull request? The serializability test uses the same MemoryStream instance for 3 different queries. If any of those queries

spark git commit: [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer

2018-03-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9945b0227 -> bd201bf61 [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer ## What changes were proposed in this pull request? CacheKafkaConsumer in the project `kafka-0-10-sql` is designed to maintain a

spark git commit: [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer (branch-2.3)

2018-03-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.3 21b6de459 -> 6937571ab [SPARK-23623][SS] Avoid concurrent use of cached consumers in CachedKafkaConsumer (branch-2.3) This is a backport of #20767 to branch 2.3 ## What changes were proposed in this pull request? CacheKafkaConsumer

spark git commit: [SPARK-25644][SS] Fix java foreachBatch in DataStreamWriter

2018-10-05 Thread zsxwing
ong`. ## How was this patch tested? New java test. Closes #22633 from zsxwing/fix-java-foreachbatch. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7dcc90fb Tree: h

spark git commit: [SPARK-25644][SS] Fix java foreachBatch in DataStreamWriter

2018-10-05 Thread zsxwing
her `scala.Long`. ## How was this patch tested? New java test. Closes #22633 from zsxwing/fix-java-foreachbatch. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0a70afdc Tree: h

spark git commit: [SPARK-25771][PYSPARK] Fix improper synchronization in PythonWorkerFactory

2018-10-22 Thread zsxwing
ock. 2. `createSimpleWorker` misses `synchronized` when updating `simpleWorkers`. Other changes are just to improve the code style to make the thread-safe contract clear. ## How was this patch tested? Jenkins Closes #22770 from zsxwing/pwf. Authored-by: Shixiong Zhu Signed-off-by: Shixiong

spark git commit: [SPARK-25773][CORE] Cancel zombie tasks in a result stage when the job finishes

2018-10-30 Thread zsxwing
lso fixes two minor issues while I'm touching DAGScheduler: - Invalid spark.job.interruptOnCancel should not crash DAGScheduler. - Non fatal errors should not crash DAGScheduler. ## How was this patch tested? The new unit tests. Closes #22771 from zsxwing/SPARK-25773. Lead-authored-by: Shixiong

spark git commit: [SPARK-26042][SS][TESTS] Fix a potential hang in KafkaContinuousSourceTopicDeletionSuite

2018-11-14 Thread zsxwing
ing to initialize `executedPlan` when `isRDD` is running, this thread will hang forever. This PR just materializes `executedPlan` so that accessing it when `toRdd` is running doesn't need to wait for a lock ## How was this patch tested? Jenkins Closes #23023 from zsxwing/SPARK-26042. Autho

spark git commit: [SPARK-26042][SS][TESTS] Fix a potential hang in KafkaContinuousSourceTopicDeletionSuite

2018-11-14 Thread zsxwing
ing to initialize `executedPlan` when `isRDD` is running, this thread will hang forever. This PR just materializes `executedPlan` so that accessing it when `toRdd` is running doesn't need to wait for a lock ## How was this patch tested? Jenkins Closes #23023 from zsxwing/SPARK-26042. Autho

spark git commit: [SPARK-25449][CORE] Heartbeat shouldn't include accumulators for zero metrics

2018-09-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master a28146568 -> 9362c5cc2 [SPARK-25449][CORE] Heartbeat shouldn't include accumulators for zero metrics ## What changes were proposed in this pull request? Heartbeat shouldn't include accumulators for zero metrics. Heartbeats sent from

spark git commit: [SPARK-25495][SS] FetchedData.reset should reset all fields

2018-09-25 Thread zsxwing
ise it will cause inconsistent cached data and may make Kafka connector return wrong results. ## How was this patch tested? The new unit test. Closes #22507 from zsxwing/fix-kafka-reset. Lead-authored-by: Shixiong Zhu Co-authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu Project: http://git-

spark git commit: [SPARK-25495][SS] FetchedData.reset should reset all fields

2018-09-25 Thread zsxwing
ise it will cause inconsistent cached data and may make Kafka connector return wrong results. ## How was this patch tested? The new unit test. Closes #22507 from zsxwing/fix-kafka-reset. Lead-authored-by: Shixiong Zhu Co-authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu (cherry picked f

spark git commit: [SPARK-26069][TESTS] Fix flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures

2018-11-16 Thread zsxwing
mon/network-common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java#L217 This PR fixes the above issue and also improves the test failure messages of `assertErrorAndClosed`. ## How was this patch tested? Jenkins Closes #23041 from zsxwing/SPARK-26069. Authored-by: Shixiong

spark git commit: [SPARK-26069][TESTS] Fix flaky test: RpcIntegrationSuite.sendRpcWithStreamFailures

2018-11-16 Thread zsxwing
32e/common/network-common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java#L217 This PR fixes the above issue and also improves the test failure messages of `assertErrorAndClosed`. ## How was this patch tested? Jenkins Closes #23041 from zsxwing/SPARK-26069. Authored-by: Shixiong

[spark] branch branch-2.3 updated: [SPARK-26629][SS] Fixed error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new 5a50ae3 [SPARK-26629][SS] Fixed error

[spark] branch master updated: [SPARK-26629][SS] Fixed error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 06d5b17 [SPARK-26629][SS] Fixed error

[spark] branch branch-2.3 updated: Revert "[SPARK-26629][SS] Fixed error with multiple file stream in a query + restart on a batch that has no data for one file stream"

2019-01-16 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-2.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.3 by this push: new c0fc6d0 Revert "[SPARK-2662

[spark] branch branch-2.4 updated: [SPARK-26629][SS] Fixed error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread zsxwing
This is an automated email from the ASF dual-hosted git repository. zsxwing pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 1843c16 [SPARK-26629][SS] Fixed error

<    2   3   4   5   6   7   8   >