[GitHub] [spark] msamirkhan edited a comment on pull request #29354: [SPARK-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan edited a comment on pull request #29354: URL: https://github.com/apache/spark/pull/29354#issuecomment-669519533 The [pdf attached to the PR](https://github.com/apache/spark/files/5025167/AvroBenchmarks.pdf) contains the read and write time improvements with the commits. I have a

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669517201 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] stczwd commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
stczwd commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r466011087 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed

[GitHub] [spark] AmplabJenkins commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669517201 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA removed a comment on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
SparkQA removed a comment on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669390805 **[Test build #127101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127101/testReport)** for PR 29211 at commit [`e81c3fc`](https://gi

[GitHub] [spark] SparkQA commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
SparkQA commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669515934 **[Test build #127101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127101/testReport)** for PR 29211 at commit [`e81c3fc`](https://github.co

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [SPARK-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465980330 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriterFactory.scala ## @@ -40,6 +40,8 @@ private[sql] class AvroOutputW

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [SPARK-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r466004477 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumWriter.scala ## @@ -125,42 +125,42 @@ class SparkAvroDatumWriter[D]

[GitHub] [spark] rdblue commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
rdblue commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r466002905 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465983317 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ## @@ -182,19 +182,21 @@ class AvroDeserializer(

[GitHub] [spark] stczwd commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
stczwd commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r465996986 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r466001347 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -421,12 +418,10 @@ class SparkAvroDatumReader[T]

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r466000797 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -638,90 +628,57 @@ class SparkAvroDatumReader[T]

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r466000481 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -638,90 +628,57 @@ class SparkAvroDatumReader[T]

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465999766 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -88,17 +87,10 @@ class SparkAvroDatumReader[T](

[GitHub] [spark] agrawaldevesh commented on a change in pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
agrawaldevesh commented on a change in pull request #29211: URL: https://github.com/apache/spark/pull/29211#discussion_r465999198 ## File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala ## @@ -266,18 +266,17 @@ class BlockManag

[GitHub] [spark] stczwd commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
stczwd commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r465999241 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala ## @@ -62,4 +74,31 @@ object DataSourc

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465998634 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -809,51 +735,111 @@ class SparkAvroDatumReader[T

[GitHub] [spark] squito commented on a change in pull request #28885: [SPARK-29375][SPARK-28940][SPARK-32041][SQL] Whole plan exchange and subquery reuse

2020-08-05 Thread GitBox
squito commented on a change in pull request #28885: URL: https://github.com/apache/spark/pull/28885#discussion_r465997277 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala ## @@ -156,4 +158,46 @@ class ExchangeSuite extends SparkPlanTest

[GitHub] [spark] stczwd commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
stczwd commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r465996986 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465997315 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -809,51 +735,111 @@ class SparkAvroDatumReader[T

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465996100 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -809,51 +735,111 @@ class SparkAvroDatumReader[T

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465995487 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -420,10 +426,27 @@ class SparkAvroDatumReader[T]

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465994643 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -452,71 +452,73 @@ class SparkAvroDatumReader[T]

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465994403 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -452,71 +452,73 @@ class SparkAvroDatumReader[T]

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465994024 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java ## @@ -356,6 +356,17 @@ public void writeTo(By

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465991846 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumWriter.scala ## @@ -0,0 +1,419 @@ +/* + * Licensed to the Apache So

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465989873 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/SparkAvroDatumReader.scala ## @@ -0,0 +1,811 @@ +/* + * Licensed to the Apache So

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465983317 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ## @@ -182,19 +182,21 @@ class AvroDeserializer(

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465983050 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ## @@ -367,15 +372,45 @@ class AvroDeserializer( }

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465982616 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ## @@ -367,15 +372,45 @@ class AvroDeserializer( }

[GitHub] [spark] msamirkhan commented on a change in pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29354: URL: https://github.com/apache/spark/pull/29354#discussion_r465980330 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriterFactory.scala ## @@ -40,6 +40,8 @@ private[sql] class AvroOutputW

[GitHub] [spark] rdblue commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
rdblue commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r465974060 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed

[GitHub] [spark] SparkQA commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
SparkQA commented on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-669465889 **[Test build #127104 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127104/testReport)** for PR 29353 at commit [`b360fb6`](https://github.com

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-669461208 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-669461208 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
SparkQA commented on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-669460508 **[Test build #127103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127103/testReport)** for PR 29353 at commit [`182d6dc`](https://github.com

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465964580 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificInternalRow.scala ## @@ -192,24 +192,41 @@ final class Mut

[GitHub] [spark] msamirkhan commented on pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on pull request #29353: URL: https://github.com/apache/spark/pull/29353#issuecomment-669459288 These are the changes referred to in https://github.com/apache/spark/pull/29353#discussion_r465955120 I'll put up a separate PR with this. [SpecificInternalRow.txt](http

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465963022 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -72,137 +74,191 @@ class OrcDeseriali

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465963114 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -72,137 +74,191 @@ class OrcDeseriali

[GitHub] [spark] rdblue commented on a change in pull request #29339: [Spark-32512][SQL] add alter table add/drop partition command for datasourcev2

2020-08-05 Thread GitBox
rdblue commented on a change in pull request #29339: URL: https://github.com/apache/spark/pull/29339#discussion_r465956261 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala ## @@ -62,4 +74,31 @@ object DataSourc

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465860164 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcSerializer.scala ## @@ -150,69 +156,110 @@ class OrcSerializer

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465955120 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -73,136 +74,180 @@ class OrcDeseriali

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669414888 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669414888 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669296821 **[Test build #127100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127100/testReport)** for PR 29357 at commit [`92f3d6f`](https://gi

[GitHub] [spark] SparkQA commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669413253 **[Test build #127100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127100/testReport)** for PR 29357 at commit [`92f3d6f`](https://github.co

[GitHub] [spark] n8shadow edited a comment on pull request #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2020-08-05 Thread GitBox
n8shadow edited a comment on pull request #25012: URL: https://github.com/apache/spark/pull/25012#issuecomment-669396767 Ok, so you think the arrow backend on databricks is corrupted? I also tryed changing R package version (0.17.1 and 1.0.20200804) which doesn't seem to make any differenc

[GitHub] [spark] skambha commented on pull request #29125: [SPARK-32018][SQL][3.0] UnsafeRow.setDecimal should set null with overflowed value

2020-08-05 Thread GitBox
skambha commented on pull request #29125: URL: https://github.com/apache/spark/pull/29125#issuecomment-669396932 >How about this: we force to enable ANSI for decimal sum, so that the behavior is the same without fixing the UnsafeRow >bug? It's not an ideal fix but should be safer to ba

[GitHub] [spark] SparkQA commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC

2020-08-05 Thread GitBox
SparkQA commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669398172 **[Test build #127102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127102/testReport)** for PR 28761 at commit [`0747fcd`](https://github.com

[GitHub] [spark] n8shadow edited a comment on pull request #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2020-08-05 Thread GitBox
n8shadow edited a comment on pull request #25012: URL: https://github.com/apache/spark/pull/25012#issuecomment-669396767 Ok, so you think the arrow backend on databricks is corrupted? I also tryed going back to another R package version which doesn't seem to make any difference and still g

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669396514 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669396514 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] n8shadow commented on pull request #25012: [SPARK-28215][SQL][R] as_tibble was removed from Arrow R API

2020-08-05 Thread GitBox
n8shadow commented on pull request #25012: URL: https://github.com/apache/spark/pull/25012#issuecomment-669396767 Ok, so you think the arrow backend on databricks is defective? Because I also tryed going back to another R package version which doesn't seem to make any difference and still

[GitHub] [spark] SparkQA removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669283777 **[Test build #127098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127098/testReport)** for PR 29357 at commit [`b08aee6`](https://gi

[GitHub] [spark] SparkQA commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669394162 **[Test build #127098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127098/testReport)** for PR 29357 at commit [`b08aee6`](https://github.co

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669391659 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669391659 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669391555 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669391555 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
SparkQA commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669390805 **[Test build #127101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127101/testReport)** for PR 29211 at commit [`e81c3fc`](https://github.com

[GitHub] [spark] viirya commented on pull request #28761: [SPARK-25557][SQL][test-hive2.3] Nested column predicate pushdown for ORC

2020-08-05 Thread GitBox
viirya commented on pull request #28761: URL: https://github.com/apache/spark/pull/28761#issuecomment-669390642 retest this please This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [spark] holdenk commented on pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
holdenk commented on pull request #29211: URL: https://github.com/apache/spark/pull/29211#issuecomment-669389554 The only outstanding point of discussion is a test timeout length, which I don't believe is critical to address. If CI passes I intend to merge this PR and rebase the next PR wh

[GitHub] [spark] holdenk commented on a change in pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
holdenk commented on a change in pull request #29211: URL: https://github.com/apache/spark/pull/29211#discussion_r465926745 ## File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala ## @@ -266,18 +266,17 @@ class BlockManagerDeco

[GitHub] [spark] AmplabJenkins commented on pull request #29363: [SPARK-32546][SQL] Get table names directly from Hive tables

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669387755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29363: [SPARK-32546][SQL] Get table names directly from Hive tables

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669387755 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] holdenk commented on a change in pull request #29211: [SPARK-31197][CORE] Shutdown executor once we are done decommissioning

2020-08-05 Thread GitBox
holdenk commented on a change in pull request #29211: URL: https://github.com/apache/spark/pull/29211#discussion_r465925949 ## File path: core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionUnitSuite.scala ## @@ -54,6 +57,106 @@ class BlockManagerDecommissionU

[GitHub] [spark] SparkQA removed a comment on pull request #29363: [SPARK-32546][SQL] Get table names directly from Hive tables

2020-08-05 Thread GitBox
SparkQA removed a comment on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669296643 **[Test build #127099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127099/testReport)** for PR 29363 at commit [`9236cc6`](https://gi

[GitHub] [spark] SparkQA commented on pull request #29363: [SPARK-32546][SQL] Get table names directly from Hive tables

2020-08-05 Thread GitBox
SparkQA commented on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669384734 **[Test build #127099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127099/testReport)** for PR 29363 at commit [`9236cc6`](https://github.co

[GitHub] [spark] srowen commented on a change in pull request #27369: [SPARK-30654] Bootstrap4 docs upgrade

2020-08-05 Thread GitBox
srowen commented on a change in pull request #27369: URL: https://github.com/apache/spark/pull/27369#discussion_r465921331 ## File path: docs/_layouts/global.html ## @@ -20,12 +20,15 @@

[GitHub] [spark] MaxGekk commented on pull request #29363: [SPARK-32546][SQL] Get table names directly from Hive tables

2020-08-05 Thread GitBox
MaxGekk commented on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669371973 > What do you mean by the following, @MaxGekk ? The existing test suite already has been passing without this PR. @dongjoon-hyun I mean that all modified lines by me are c

[GitHub] [spark] dongjoon-hyun commented on pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-05 Thread GitBox
dongjoon-hyun commented on pull request #29350: URL: https://github.com/apache/spark/pull/29350#issuecomment-669346107 @yanxiaole . You are added to the Apache Spark contributor group and SPARK-32529 is assigned to you. Welcome.

[GitHub] [spark] dongjoon-hyun closed pull request #29350: [SPARK-32529][CORE] Fix Historyserver log scan aborted by application status change

2020-08-05 Thread GitBox
dongjoon-hyun closed pull request #29350: URL: https://github.com/apache/spark/pull/29350 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
dongjoon-hyun commented on a change in pull request #29357: URL: https://github.com/apache/spark/pull/29357#discussion_r465896405 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala ## @@ -165,7 +165,7 @@

[GitHub] [spark] karuppayya commented on pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-08-05 Thread GitBox
karuppayya commented on pull request #28804: URL: https://github.com/apache/spark/pull/28804#issuecomment-669329149 @cloud-fan @maropu Can you help review these changes? This is an automated message from the Apache Git Se

[GitHub] [spark] dongjoon-hyun closed pull request #29361: [SPARK-32543][R] Remove arrow::as_tibble usage in SparkR

2020-08-05 Thread GitBox
dongjoon-hyun closed pull request #29361: URL: https://github.com/apache/spark/pull/29361 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-05 Thread GitBox
dongjoon-hyun commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-669327614 Thank you, @holdenk ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dongjoon-hyun commented on pull request #29347: [SPARK-32492][SQL][FOLLOWUP][test-maven] Fix jenkins maven jobs

2020-08-05 Thread GitBox
dongjoon-hyun commented on pull request #29347: URL: https://github.com/apache/spark/pull/29347#issuecomment-669327897 Thank you, @yaooqinn and @cloud-fan ! This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669313635 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669313619 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To r

[GitHub] [spark] SparkQA removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669242777 **[Test build #127097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127097/testReport)** for PR 29357 at commit [`edb1114`](https://gi

[GitHub] [spark] AmplabJenkins commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669313619 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669312681 **[Test build #127097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127097/testReport)** for PR 29357 at commit [`edb1114`](https://github.co

[GitHub] [spark] gatorsmile commented on a change in pull request #29363: [SPARK-32546][SQL] Get table names directly from Hive tables

2020-08-05 Thread GitBox
gatorsmile commented on a change in pull request #29363: URL: https://github.com/apache/spark/pull/29363#discussion_r465870857 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ## @@ -759,15 +759,17 @@ private[hive] class HiveClientImp

[GitHub] [spark] BryanCutler commented on pull request #29320: [SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation

2020-08-05 Thread GitBox
BryanCutler commented on pull request #29320: URL: https://github.com/apache/spark/pull/29320#issuecomment-669310451 Looks great, I think it will be very helpful for PySpark to have it's own main page. Thanks @HyukjinKwon !

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465870192 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -73,136 +74,180 @@ class OrcDeseriali

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465863127 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala ## @@ -72,137 +74,191 @@ class OrcDeseriali

[GitHub] [spark] msamirkhan commented on a change in pull request #29353: [SPARK-32532][SQL] Improve ORC read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on a change in pull request #29353: URL: https://github.com/apache/spark/pull/29353#discussion_r465860164 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcSerializer.scala ## @@ -150,69 +156,110 @@ class OrcSerializer

[GitHub] [spark] holdenk commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-05 Thread GitBox
holdenk commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-669299003 I think here the motivation was to try and deal with a workload with a lot of failures on the executors and avoiding a lot of recomputes more than the locality. --

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-05 Thread GitBox
dongjoon-hyun edited a comment on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-669297999 > I get the value of 3x replication for persistent data; this is in theory persistence for data that is already recreateable right? Right, @srowen . >

[GitHub] [spark] msamirkhan commented on pull request #29354: [WIP][Spark-32533][SQL] Improve Avro read/write performance on nested structs and array of structs

2020-08-05 Thread GitBox
msamirkhan commented on pull request #29354: URL: https://github.com/apache/spark/pull/29354#issuecomment-669298464 > It might be great if we can elabourate how it improves performance. We can focus on the fix only instead of mixing refactoring here. Have a couple of bugs to fix, tha

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669297685 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29363: [SPARK-32546][SQL] Gets table names directly from Hive tables

2020-08-05 Thread GitBox
AmplabJenkins removed a comment on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669297556 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #29331: [SPARK-32517][CORE] Add StorageLevel.DISK_ONLY_3

2020-08-05 Thread GitBox
dongjoon-hyun commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-669297999 > I get the value of 3x replication for persistent data; this is in theory persistence for data that is already recreateable right? Right, @srowen . > cached

[GitHub] [spark] AmplabJenkins commented on pull request #29363: [SPARK-32546][SQL] Gets table names directly from Hive tables

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669297556 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] AmplabJenkins commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
AmplabJenkins commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669297685 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] SparkQA commented on pull request #29363: [SPARK-32546][SQL] Gets table names directly from Hive tables

2020-08-05 Thread GitBox
SparkQA commented on pull request #29363: URL: https://github.com/apache/spark/pull/29363#issuecomment-669296643 **[Test build #127099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127099/testReport)** for PR 29363 at commit [`9236cc6`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
SparkQA commented on pull request #29357: URL: https://github.com/apache/spark/pull/29357#issuecomment-669296821 **[Test build #127100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127100/testReport)** for PR 29357 at commit [`92f3d6f`](https://github.com

[GitHub] [spark] gengliangwang commented on a change in pull request #29357: [SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in style check by default

2020-08-05 Thread GitBox
gengliangwang commented on a change in pull request #29357: URL: https://github.com/apache/spark/pull/29357#discussion_r465852807 ## File path: scalastyle-config.xml ## @@ -264,6 +264,19 @@ This file is divided into 3 sections: of Commons Lang 2 (package org.apache.commons

<    1   2   3   4   5   6   >