[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-657321460 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-657271683 **[Test build #125727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125727/testReport)** for PR 27649 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-657321460 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28746: [SPARK-31922][CORE] logDebug "RpcEnv already stopped" error on LocalSparkCluster shutdown

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28746: URL: https://github.com/apache/spark/pull/28746#issuecomment-657320976 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-657320994 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-07-12 Thread GitBox
SparkQA commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-657321132 **[Test build #125727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125727/testReport)** for PR 27649 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-657320994 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins commented on pull request #28746: [SPARK-31922][CORE] logDebug "RpcEnv already stopped" error on LocalSparkCluster shutdown

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #28746: URL: https://github.com/apache/spark/pull/28746#issuecomment-657320976 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-657271658 **[Test build #125729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125729/testReport)** for PR 27333 at commit

[GitHub] [spark] SparkQA commented on pull request #27333: [SPARK-29438][SS][FOLLOWUP] Add regression tests for Streaming Aggregation and flatMapGroupsWithState

2020-07-12 Thread GitBox
SparkQA commented on pull request #27333: URL: https://github.com/apache/spark/pull/27333#issuecomment-657320554 **[Test build #125729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125729/testReport)** for PR 27333 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-657319488 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-657319488 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-657318840 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] SparkQA removed a comment on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-657271700 **[Test build #125733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125733/testReport)** for PR 24173 at commit

[GitHub] [spark] SparkQA commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-07-12 Thread GitBox
SparkQA commented on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-657318989 **[Test build #125733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125733/testReport)** for PR 24173 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-657318834 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-657318834 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-657271684 **[Test build #125724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125724/testReport)** for PR 28422 at commit

[GitHub] [spark] Ngone51 commented on pull request #28979: [SPARK-32154][SQL] Use ExpressionEncoder for the return type of ScalaUDF to convert to catalyst type

2020-07-12 Thread GitBox
Ngone51 commented on pull request #28979: URL: https://github.com/apache/spark/pull/28979#issuecomment-657318644 thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] Ngone51 commented on pull request #29050: [SPARK-32238][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in ScalaUDF

2020-07-12 Thread GitBox
Ngone51 commented on pull request #29050: URL: https://github.com/apache/spark/pull/29050#issuecomment-657318492 Thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-07-12 Thread GitBox
SparkQA commented on pull request #28422: URL: https://github.com/apache/spark/pull/28422#issuecomment-657318411 **[Test build #125724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125724/testReport)** for PR 28422 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657316227 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657316227 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
SparkQA commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657316041 **[Test build #125742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125742/testReport)** for PR 29078 at commit

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-12 Thread GitBox
AngersZh commented on a change in pull request #29075: URL: https://github.com/apache/spark/pull/29075#discussion_r453394363 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -54,9 +55,15 @@ private[sql] class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657313962 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657313959 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-12 Thread GitBox
AngersZh commented on a change in pull request #29075: URL: https://github.com/apache/spark/pull/29075#discussion_r453392851 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -27,6 +27,7 @@ import

[GitHub] [spark] AmplabJenkins commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657313959 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657271639 **[Test build #125725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125725/testReport)** for PR 28363 at commit

[GitHub] [spark] SparkQA commented on pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-07-12 Thread GitBox
SparkQA commented on pull request #28363: URL: https://github.com/apache/spark/pull/28363#issuecomment-657313470 **[Test build #125725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125725/testReport)** for PR 28363 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-657311591 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] maropu commented on a change in pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29075: URL: https://github.com/apache/spark/pull/29075#discussion_r453391118 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala ## @@ -53,11 +53,17 @@ private[sql]

[GitHub] [spark] AmplabJenkins commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-657311591 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-07-12 Thread GitBox
SparkQA commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-657311315 **[Test build #125741 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125741/testReport)** for PR 28953 at commit

[GitHub] [spark] dongjoon-hyun commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-12 Thread GitBox
dongjoon-hyun commented on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657311528 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] moomindani commented on pull request #28953: [SPARK-32013][SQL] Support query execution before reading DataFrame and before/after writing DataFrame over JDBC

2020-07-12 Thread GitBox
moomindani commented on pull request #28953: URL: https://github.com/apache/spark/pull/28953#issuecomment-657310963 retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] maropu commented on a change in pull request #29075: [SPARK-32284][SQL] Avoid expanding too many CNF predicates in partition pruning

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29075: URL: https://github.com/apache/spark/pull/29075#discussion_r453388615 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala ## @@ -103,7 +110,7 @@ private[sql] class

[GitHub] [spark] HyukjinKwon commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-12 Thread GitBox
HyukjinKwon commented on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657305140 Sounds good. We should fix the flakiness. I will share it a bit later when we're good. This is an

[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-12 Thread GitBox
c21 commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-657304351 @viirya, I see your point for coalescing reduces parallelism to cause more OOM on build side. I agree this can happen. All in all, this is a disable-by-default feature, and user

[GitHub] [spark] dongjoon-hyun commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-12 Thread GitBox
dongjoon-hyun commented on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657303211 Ya. It's always good to share our direction. BTW, @gatorsmile and @HyukjinKwon . I'd like to recommend to hold on a little bit to avoid bad impression. We are not

[GitHub] [spark] maropu commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-07-12 Thread GitBox
maropu commented on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-657303033 also cc: @viirya This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657302478 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [WIP][SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-657302154 Can one of the admins verify this patch? This is an automated message from the Apache Git

[GitHub] [spark] AmplabJenkins commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657302478 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453384603 ## File path: docs/sql-ref-syntax-qry-select-lateral-view.md ## @@ -0,0 +1,123 @@ +--- +layout: global +title: LATERAL VIEW Clause +displayTitle: LATERAL

[GitHub] [spark] SparkQA commented on pull request #24990: [SPARK-28191][SS] New data source - state - reader part

2020-07-12 Thread GitBox
SparkQA commented on pull request #24990: URL: https://github.com/apache/spark/pull/24990#issuecomment-657302246 **[Test build #125740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125740/testReport)** for PR 24990 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29080: [WIP][SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-657302373 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453384526 ## File path: docs/sql-ref-syntax-qry-select.md ## @@ -159,3 +159,6 @@ SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] } *

[GitHub] [spark] AmplabJenkins commented on pull request #29080: [WIP][SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-657302154 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To

[GitHub] [spark] viirya commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-12 Thread GitBox
viirya commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-657302116 Assume you are joining two tables with 512 and 256 buckets. Without coalescing table, two tables might be shuffled to 1024 or more partitions. Building hash map is okay. When

[GitHub] [spark] SparkQA commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-12 Thread GitBox
SparkQA commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657302232 **[Test build #125739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125739/testReport)** for PR 28957 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #29057: [SPARK-32245][INFRA] Run Spark tests in Github Actions

2020-07-12 Thread GitBox
HyukjinKwon commented on pull request #29057: URL: https://github.com/apache/spark/pull/29057#issuecomment-657301895 Actually, we should fix to make it easier to read at SPARK-32253. I will do it a bit later to document how to read but sure I will still share some contexts in dev mailing

[GitHub] [spark] adjordan opened a new pull request #29080: [WIP][SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-12 Thread GitBox
adjordan opened a new pull request #29080: URL: https://github.com/apache/spark/pull/29080 ### What changes were proposed in this pull request? I have changed the `fit` method on `CrossValidator.scala` to run training in parallel not only across models but also across

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453383682 ## File path: docs/sql-ref-syntax-qry-select-ignore-nulls.md ## @@ -0,0 +1,66 @@ +--- +layout: global +title: IGNORE NULLS +displayTitle: IGNORE NULLS

[GitHub] [spark] github-actions[bot] closed pull request #26674: [SPARK-30059][CORE]Stop AsyncEventQueue when interrupted in dispatch

2020-07-12 Thread GitBox
github-actions[bot] closed pull request #26674: URL: https://github.com/apache/spark/pull/26674 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] c21 commented on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-12 Thread GitBox
c21 commented on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-657300530 > We build hash map for each bucket on other side and it also sounds to OOM easily. This feature is disabled by a config by default, so it may be okay. But we should be careful not

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657299923 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657299919 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657298556 **[Test build #125738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125738/testReport)** for PR 29078 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657299919 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
SparkQA commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657299910 **[Test build #125738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125738/testReport)** for PR 29078 at commit

[GitHub] [spark] c21 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-07-12 Thread GitBox
c21 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-657299848 > Here are some numbers when I joined two tables (store_sales from TPC-DS - 100 SF) and did `count` on it. It's run on 8 executors (8 cores each) and generates about 47GB of

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453382619 ## File path: docs/sql-ref-syntax-qry-select-case.md ## @@ -0,0 +1,112 @@ +--- +layout: global +title: CASE Clause +displayTitle: CASE Clause +license: |

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657298699 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657298699 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
SparkQA commented on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657298556 **[Test build #125738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125738/testReport)** for PR 29078 at commit

[GitHub] [spark] HyukjinKwon commented on pull request #28957: [SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-12 Thread GitBox
HyukjinKwon commented on pull request #28957: URL: https://github.com/apache/spark/pull/28957#issuecomment-657297964 Sure, thanks @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453381085 ## File path: docs/sql-ref-syntax-qry-select-case.md ## @@ -0,0 +1,112 @@ +--- +layout: global +title: CASE Clause +displayTitle: CASE Clause +license: |

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380967 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -117,6 +145,21 @@ CREATE TABLE student (id INT, name STRING) CREATE TABLE student

[GitHub] [spark] imback82 commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-07-12 Thread GitBox
imback82 commented on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-657297227 cc: @maropu @cloud-fan This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for sort merge join if applicable

2020-07-12 Thread GitBox
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-657297080 Thanks @c21! > Re POC - I feel overall approach looks good to me. But IMO I think we should do the coalesce/divide in physical plan rule, but not logical plan rule.

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380731 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380511 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380479 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380378 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380444 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380134 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380134 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453380134 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453379208 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -57,9 +63,31 @@ as any order. For example, you can write COMMENT table_comment

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-657294470 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-657294431 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-12 Thread GitBox
HeartSaVioR commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-657294528 @zsxwing Would you mind if I ask your opinion on https://github.com/apache/spark/pull/27694#issuecomment-651479578 ?

[GitHub] [spark] AmplabJenkins commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-657294470 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453378825 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -31,7 +31,13 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [

[GitHub] [spark] SparkQA removed a comment on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-657256615 **[Test build #125722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125722/testReport)** for PR 29074 at commit

[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-12 Thread GitBox
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-657294330 **[Test build #125737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125737/testReport)** for PR 27694 at commit

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453378690 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -31,7 +31,13 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [

[GitHub] [spark] maropu commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS]Add missing keywords

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r453378690 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -31,7 +31,13 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [

[GitHub] [spark] AmplabJenkins commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-12 Thread GitBox
AmplabJenkins commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-657294431 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-12 Thread GitBox
HeartSaVioR commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-657294339 cc. @tdas @zsxwing @gaborgsomogyi @xuanyuanking @uncleGen This is an automated message from the Apache Git

[GitHub] [spark] SparkQA commented on pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection

2020-07-12 Thread GitBox
SparkQA commented on pull request #29074: URL: https://github.com/apache/spark/pull/29074#issuecomment-657294282 **[Test build #125722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125722/testReport)** for PR 29074 at commit

[GitHub] [spark] HeartSaVioR commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-12 Thread GitBox
HeartSaVioR commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-657293698 retest this, please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #29063: [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a different encoding

2020-07-12 Thread GitBox
HyukjinKwon commented on pull request #29063: URL: https://github.com/apache/spark/pull/29063#issuecomment-657293521 Thanks @srowen and @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] maropu commented on a change in pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-12 Thread GitBox
maropu commented on a change in pull request #29045: URL: https://github.com/apache/spark/pull/29045#discussion_r453376631 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala ## @@ -116,47 +116,53 @@ object OrcUtils extends

[GitHub] [spark] maropu edited a comment on pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-12 Thread GitBox
maropu edited a comment on pull request #29079: URL: https://github.com/apache/spark/pull/29079#issuecomment-657289124 Could you show us performance numbers in the PR description, first? I think we need to check the trade-off between #parallelism and shuffle I/O.

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657289387 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
AmplabJenkins removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657289386 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] SparkQA removed a comment on pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-12 Thread GitBox
SparkQA removed a comment on pull request #29078: URL: https://github.com/apache/spark/pull/29078#issuecomment-657287127 **[Test build #125735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125735/testReport)** for PR 29078 at commit

<    1   2   3   4   5   >