[GitHub] [spark] huaxingao commented on pull request #29119: Update RandomForestClassifierExample.scala

2020-07-14 Thread GitBox
huaxingao commented on pull request #29119: URL: https://github.com/apache/spark/pull/29119#issuecomment-658582560 @kevinyu1949 Thanks for submitting a PR. Actually we intentionally changed ```labelIndexer.labels``` to ```labelIndexer.labelsArray(0)``` because ```StringIndexerModel.labels`

[GitHub] [spark] HyukjinKwon commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox
HyukjinKwon commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-658581488 @HeartSaVioR, no big deal but let's make sure to mention which branch this PR went through as a comment. Th

[GitHub] [spark] HyukjinKwon edited a comment on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox
HyukjinKwon edited a comment on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-658581488 @HeartSaVioR, no big deal but let's make sure to leave a comment to mention which branch this PR went through. --

[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
adjordan edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658579068 Yes, I know the difference between the two. I just assumed that `MLUtils.kFold` was doing the splits according to the k-fold method, given then name, and not the random

[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
adjordan commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658579068 Yes, I know the difference between the two. I just assumed that `MLUtils.kFold` was doing the splits according to the k-fold method, not the random sub-sampling method. But I s

[GitHub] [spark] maropu commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r454827741 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2645,21 +2645,22 @@ object SQLConf { .booleanConf

[GitHub] [spark] maropu commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r454827741 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2645,21 +2645,22 @@ object SQLConf { .booleanConf

[GitHub] [spark] Fokko commented on pull request #29109: [SPARK-32311][PYSPARK][TESTS] Remove duplicate import

2020-07-14 Thread GitBox
Fokko commented on pull request #29109: URL: https://github.com/apache/spark/pull/29109#issuecomment-658578504 These PR's are a bit small indeed, but there are a few coming up that are much bigger. I would like to split them a bit to make it easier to digest for the reviewers/committers.

[GitHub] [spark] maropu commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29079: URL: https://github.com/apache/spark/pull/29079#discussion_r454827484 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2645,21 +2645,22 @@ object SQLConf { .booleanConf

[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-14 Thread GitBox
HyukjinKwon commented on pull request #29117: URL: https://github.com/apache/spark/pull/29117#issuecomment-658574515 retest this please This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] maropu commented on a change in pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29118: URL: https://github.com/apache/spark/pull/29118#discussion_r454819744 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala ## @@ -284,6 +284,15 @@ class EliminateSortsSu

[GitHub] [spark] yaooqinn commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-14 Thread GitBox
yaooqinn commented on pull request #29064: URL: https://github.com/apache/spark/pull/29064#issuecomment-658569092 cc @maropu @cloud-fan @huaxingao. Please check the reference doc for set tz command. This is an automated mes

[GitHub] [spark] kevinyu1949 opened a new pull request #29119: Update RandomForestClassifierExample.scala

2020-07-14 Thread GitBox
kevinyu1949 opened a new pull request #29119: URL: https://github.com/apache/spark/pull/29119 Refine wrong code. ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing chan

[GitHub] [spark] viirya commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
viirya commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658567012 okay, sounds good. This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525 Actually, the file size check test cases are very ~flaky~ fragile. We hit many issues before when we added `Spark Version` metadata on Parquet/ORC/Avro. > Do you

[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525 Actually, the file size check test cases are very flaky. We hit many issues before when we add `Spark Version` metadata on Parquet/ORC/Avro. > Do you think it is easy to

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525 This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] viirya commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
viirya commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658565157 Do you think it is easy to add a test that checks file size like in the description? Or current one is enough? T

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox
HyukjinKwon commented on a change in pull request #29088: URL: https://github.com/apache/spark/pull/29088#discussion_r454812360 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -2353,6 +2354,43 @@ abstract class CSVSuite

[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658565064 Thank you, @maropu and @viirya . Yes. The commit log and JIRA will explain the situation. I made the test case minimally. -

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox
HyukjinKwon commented on a change in pull request #29088: URL: https://github.com/apache/spark/pull/29088#discussion_r454812272 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -2353,6 +2354,43 @@ abstract class CSVSuite

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun commented on a change in pull request #29118: URL: https://github.com/apache/spark/pull/29118#discussion_r454811733 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala ## @@ -284,6 +284,15 @@ class Eliminate

[GitHub] [spark] viirya edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
viirya edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658563157 Yeah, because the different data distribution, physical encoding of data could result in different size, that is what I meant. ---

[GitHub] [spark] viirya commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
viirya commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658563157 Yeah, because the different data distribution, physical encoding of data could result in different size. This is

[GitHub] [spark] maropu commented on a change in pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29118: URL: https://github.com/apache/spark/pull/29118#discussion_r454810395 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala ## @@ -284,6 +284,15 @@ class EliminateSortsSu

[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-14 Thread GitBox
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-658561486 Retest this please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629 Also, cc @cloud-fan , @HyukjinKwon , @maropu This is an automated message from the Apache Git Serv

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339 Could you review this, @viirya ? This will protect us from the future regression. This part is tricky.

[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629 Also, cc @cloud-fan and @HyukjinKwon . This is an automated message from the Apache Git Service. To respon

[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29118: URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339 Could you review this, @viirya ? This is an automated message from the Apache Git Service. To respond to t

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706 The most big factor is file formats instead of Spark side. For example, in the above example, ORC files are small because it supports a special encoding when the

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706 No~ It depends on file formats instead of Spark side. For example, in the above example, ORC files are small because it supports a special encoding when the input

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706 No~ It depends on file formats instead of Spark side. For example, in the above example, ORC files are small because it supports a special encoding when the data is sort

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658558813 I made a PR to add a test coverage for the above case. - https://github.com/apache/spark/pull/29118 Thi

[GitHub] [spark] viirya commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
viirya commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658558946 Oh, this is interesting. I know removing `Sort` before `Repartition` will result in different data distribution because `Repartition` uses `RoundRobinPartitioning`. Because I thi

[GitHub] [spark] dongjoon-hyun opened a new pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox
dongjoon-hyun opened a new pull request #29118: URL: https://github.com/apache/spark/pull/29118 ### What changes were proposed in this pull request? This PR aims to add a test case to EliminateSortsSuite to protect a valid use case which is using ORDER BY in DISTRIBUTE BY statement.

[GitHub] [spark] viirya commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
viirya commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814 Do you read the above too links? The current approach is repeated random sub-sampling validation, this PR changes to k-fold cross-validation. ---

[GitHub] [spark] viirya edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
viirya edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814 Do you read the above two links? The current approach is repeated random sub-sampling validation, this PR changes to k-fold cross-validation.

[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox
SparkQA commented on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658555806 **[Test build #125876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)** for PR 27694 at commit [`86131af`](https://github.co

[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox
SparkQA removed a comment on pull request #27694: URL: https://github.com/apache/spark/pull/27694#issuecomment-658519508 **[Test build #125876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)** for PR 27694 at commit [`86131af`](https://gi

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220 Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside `core` module. I'm sure that this is a meaningful step inside Spark. However, we didn't

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220 Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside `core` module. I'm sure that this is a meaningful step inside Spark. However, we didn't

[GitHub] [spark] dongjoon-hyun commented on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220 Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside `core` module only. I'm sure that this is a meaningful step inside Spark. However, we didn't te

[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
adjordan edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236 @viirya Sorry, can you explain? I don't see how it changes the technique, it just allows models from multiple folds to be run in parallel. `MLUtils.kFold` is doing k-fol

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658550248 Very sorry, guys. Due to the above regression, I'll revert this commit urgently. We can rethink about this PR. ---

[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r454795948 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apac

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658549984 **AFTER SPARK-32276** ``` scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t") scala

[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox
maropu commented on a change in pull request #29085: URL: https://github.com/apache/spark/pull/29085#discussion_r454780673 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala ## @@ -87,17 +90,60 @@ trait BaseScriptTransformat

[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
adjordan edited a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236 @viirya Sorry, can you explain? I don't see how it changes the technique, it just allows models from multiple folds to be run in parallel. -

[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
adjordan commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236 @viirya Sorry, can you explain? I don't see how it changes anything, it just allows models from multiple folds to be run in parallel. -

[GitHub] [spark] srowen commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox
srowen commented on a change in pull request #29111: URL: https://github.com/apache/spark/pull/29111#discussion_r454792607 ## File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ## @@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage

[GitHub] [spark] srowen commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox
srowen commented on pull request #29111: URL: https://github.com/apache/spark/pull/29111#issuecomment-658546568 I think I understand the last test failures, will fix too. This is an automated message from the Apache Git Servi

[GitHub] [spark] MaxGekk commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-14 Thread GitBox
MaxGekk commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-658546141 @cloud-fan Anything else should I do in the PR to be merged? This is an automated message from the Apache Git Se

[GitHub] [spark] stczwd commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox
stczwd commented on a change in pull request #29088: URL: https://github.com/apache/spark/pull/29088#discussion_r454791986 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala ## @@ -39,6 +39,10 @@ class CsvOutputWriter(

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small final Parquet/ORC files, we do the above tricks, don't we? This may cause a regression on the size of output storage. ---

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small final Parquet/ORC files, we do the above tricks, don't we? This PR may cause a regression on the size of output storage.

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun edited a comment on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small final Parquet/ORC files, we do the above tricks, don't we? This is an automated m

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475 To generate small Parquet/ORC files, we do the above tricks, don't we? This is an automated message from t

[GitHub] [spark] warrenzhu25 edited a comment on pull request #29044: [WIP][SPARK-32227] Fix regression bug in load-spark-env.cmd with Spark 3.0.0

2020-07-14 Thread GitBox
warrenzhu25 edited a comment on pull request #29044: URL: https://github.com/apache/spark/pull/29044#issuecomment-656771107 > It's directly relevant to this PR because your patch is changing `environment` variable. > > * Please see this for the detail (https://github.com/cdarlint/win

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658543717 Oops. Sorry, guys. It seems that I missed something during testing. For the following case, we should not remove `Sort`. **BEFORE THIS PR** ```scala scala> Seq

[GitHub] [spark] warrenzhu25 commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-07-14 Thread GitBox
warrenzhu25 commented on pull request #28942: URL: https://github.com/apache/spark/pull/28942#issuecomment-658543670 @gengliangwang Tests passed, could you help merge this? This is an automated message from the Apache Git Ser

[GitHub] [spark] HyukjinKwon opened a new pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-14 Thread GitBox
HyukjinKwon opened a new pull request #29117: URL: https://github.com/apache/spark/pull/29117 ### What changes were proposed in this pull request? TBD ### Why are the changes needed? TBD ### Does this PR introduce _any_ user-facing change? TBD ### Ho

[GitHub] [spark] HeartSaVioR closed pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox
HeartSaVioR closed pull request #29077: URL: https://github.com/apache/spark/pull/29077 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HeartSaVioR commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox
HeartSaVioR commented on pull request #29077: URL: https://github.com/apache/spark/pull/29077#issuecomment-658539797 Thanks for the reviewing and kind words :) I'll deal with merging. This is an automated message from the Apa

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox
dongjoon-hyun commented on a change in pull request #29111: URL: https://github.com/apache/spark/pull/29111#discussion_r454784921 ## File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala ## @@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends Pipeline

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox
dongjoon-hyun commented on a change in pull request #29111: URL: https://github.com/apache/spark/pull/29111#discussion_r454784282 ## File path: examples/src/main/scala/org/apache/spark/examples/SparkKMeans.scala ## @@ -102,5 +102,10 @@ object SparkKMeans { kPoints.foreach(

[GitHub] [spark] aokolnychyi commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
aokolnychyi commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658538432 Thanks, everyone! This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun commented on pull request #29089: URL: https://github.com/apache/spark/pull/29089#issuecomment-658538140 Also, cc @gatorsmile and @cloud-fan This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] SparkQA removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
SparkQA removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658519469 **[Test build #125874 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)** for PR 29080 at commit [`6dd0a4d`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536762 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125

[GitHub] [spark] AmplabJenkins commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To r

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658537137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536619 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox
SparkQA commented on pull request #29080: URL: https://github.com/apache/spark/pull/29080#issuecomment-658536994 **[Test build #125874 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)** for PR 29080 at commit [`6dd0a4d`](https://github.co

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536758 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
SparkQA removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658491516 **[Test build #125865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)** for PR 29114 at commit [`5630999`](https://gi

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536691 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To r

[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
SparkQA commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658536417 **[Test build #125865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)** for PR 29114 at commit [`5630999`](https://github.co

[GitHub] [spark] dongjoon-hyun closed pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox
dongjoon-hyun closed pull request #29089: URL: https://github.com/apache/spark/pull/29089 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
SparkQA commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658535423 **[Test build #125878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125878/testReport)** for PR 29114 at commit [`465fd8a`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534819 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To r

[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox
SparkQA removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658493500 **[Test build #125867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)** for PR 28708 at commit [`fe5ba7b`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658503907 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox
SparkQA commented on pull request #28708: URL: https://github.com/apache/spark/pull/28708#issuecomment-658534225 **[Test build #125867 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)** for PR 28708 at commit [`fe5ba7b`](https://github.co

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658533186 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125

[GitHub] [spark] HyukjinKwon commented on pull request #29116: [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions

2020-07-14 Thread GitBox
HyukjinKwon commented on pull request #29116: URL: https://github.com/apache/spark/pull/29116#issuecomment-658533425 Thanks, @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To r

[GitHub] [spark] SparkQA removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox
SparkQA removed a comment on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658485359 **[Test build #125863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)** for PR 28848 at commit [`0e00862`](https://gi

[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox
SparkQA commented on pull request #28848: URL: https://github.com/apache/spark/pull/28848#issuecomment-658532861 **[Test build #125863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)** for PR 28848 at commit [`0e00862`](https://github.co

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins removed a comment on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
AmplabJenkins commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox
SparkQA commented on pull request #29114: URL: https://github.com/apache/spark/pull/29114#issuecomment-658529122 **[Test build #125877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125877/testReport)** for PR 29114 at commit [`bdf31a8`](https://github.com

  1   2   3   4   5   6   7   8   9   10   >