[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-06-15 Thread GitBox
SparkQA removed a comment on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-643862941 **[Test build #124023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124023/testReport)** for PR 28781 at commit

[GitHub] [spark] MaxGekk commented on pull request #28829: [SPARK-31992][SQL] Benchmark the EXCEPTION rebase mode

2020-06-15 Thread GitBox
MaxGekk commented on pull request #28829: URL: https://github.com/apache/spark/pull/28829#issuecomment-643930188 @cloud-fan Please, review the PR This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

2020-06-15 Thread GitBox
SparkQA commented on pull request #28781: URL: https://github.com/apache/spark/pull/28781#issuecomment-643929926 **[Test build #124023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124023/testReport)** for PR 28781 at commit

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-15 Thread GitBox
dongjoon-hyun edited a comment on pull request #28830: URL: https://github.com/apache/spark/pull/28830#issuecomment-643929047 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-15 Thread GitBox
dongjoon-hyun commented on pull request #28830: URL: https://github.com/apache/spark/pull/28830#issuecomment-643929047 The last commit is to trying to preserve the previous behavior (whatever it was) since Apache Spark 2.2.0 although there is no guarantee which it safe or not. We will

[GitHub] [spark] HyukjinKwon commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-15 Thread GitBox
HyukjinKwon commented on pull request #28830: URL: https://github.com/apache/spark/pull/28830#issuecomment-643927909 I am okay to revert it for now but I couldn't fully follow why we expect an explicit order from a set. Has it been ever guaranteed somewhere? Using `distinct`, we can

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext

2020-06-15 Thread GitBox
AmplabJenkins removed a comment on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-643926776 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext

2020-06-15 Thread GitBox
AmplabJenkins commented on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-643926776 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext

2020-06-15 Thread GitBox
SparkQA commented on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-643926432 **[Test build #124035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124035/testReport)** for PR 28784 at commit

[GitHub] [spark] dilipbiswal commented on pull request #28032: [SPARK-31264][SQL] Repartition by dynamic partition columns before insert partition table

2020-06-15 Thread GitBox
dilipbiswal commented on pull request #28032: URL: https://github.com/apache/spark/pull/28032#issuecomment-643926524 @wangyum Thanks for your response. If the incoming data is not even distributed by the repartitioning key, wouldn't this strategy create issues when there is skew in the

[GitHub] [spark] yaooqinn commented on pull request #28784: [SPARK-31957][SQL][test-maven] Cleanup hive scratch dir for the developer api startWithContext

2020-06-15 Thread GitBox
yaooqinn commented on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-643926043 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] yaooqinn commented on pull request #28784: [SPARK-31957][SQL] Cleanup hive scratch dir for the developer api startWithContext

2020-06-15 Thread GitBox
yaooqinn commented on pull request #28784: URL: https://github.com/apache/spark/pull/28784#issuecomment-643925671 Thanks @HyukjinKwon and @juliuszsompolski, I was waiting for https://github.com/apache/spark/pull/28797 to be merged and then ping you guys. Now it's been done. The

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numDropppedRowsByWatermark"

2020-06-15 Thread GitBox
dongjoon-hyun commented on a change in pull request #28828: URL: https://github.com/apache/spark/pull/28828#discussion_r439949163 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ## @@ -43,7 +43,7 @@ class StateOperatorProgress private[sql](

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28828: [SPARK-24634][SS][FOLLOWUP] Rename the variable from "numLateInputs" to "numDropppedRowsByWatermark"

2020-06-15 Thread GitBox
dongjoon-hyun commented on a change in pull request #28828: URL: https://github.com/apache/spark/pull/28828#discussion_r439949163 ## File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ## @@ -43,7 +43,7 @@ class StateOperatorProgress private[sql](

[GitHub] [spark] cloud-fan commented on pull request #28797: [SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber

2020-06-15 Thread GitBox
cloud-fan commented on pull request #28797: URL: https://github.com/apache/spark/pull/28797#issuecomment-643923811 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan closed pull request #28797: [SPARK-31926][SQL][TEST-HIVE1.2][test-maven] Fix concurrency issue for ThriftCLIService to getPortNumber

2020-06-15 Thread GitBox
cloud-fan closed pull request #28797: URL: https://github.com/apache/spark/pull/28797 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on pull request #28809: [SPARK-31959][SQL][3.0] Fix Gregorian-Julian micros rebasing while switching standard time zone offset

2020-06-15 Thread GitBox
MaxGekk commented on pull request #28809: URL: https://github.com/apache/spark/pull/28809#issuecomment-643922594 I am going to skip the test checks if JDK tzdb is outdated and Asia/Hong_Kong doesn't have timestamps overlapping in 1945 at all.

[GitHub] [spark] MaxGekk commented on pull request #28809: [SPARK-31959][SQL][3.0] Fix Gregorian-Julian micros rebasing while switching standard time zone offset

2020-06-15 Thread GitBox
MaxGekk commented on pull request #28809: URL: https://github.com/apache/spark/pull/28809#issuecomment-643920058 > It might be Amplap Jenkins host issue (Java version or environment). It uses JDK w/ outdated time zone database (not clear from log which version): ```

<    3   4   5   6   7   8