[GitHub] [spark] mridulm commented on a change in pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to

2020-07-03 Thread GitBox
mridulm commented on a change in pull request #28287: URL: https://github.com/apache/spark/pull/28287#discussion_r449746678 ## File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ## @@ -289,13 +290,23 @@ private[spark] class ExecutorAllocationManager

[GitHub] [spark] mridulm commented on a change in pull request #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to

2020-07-03 Thread GitBox
mridulm commented on a change in pull request #28287: URL: https://github.com/apache/spark/pull/28287#discussion_r447233292 ## File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ## @@ -696,7 +713,9 @@ private[spark] class ExecutorAllocationManager(

[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-03 Thread GitBox
beliefer commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r449740822 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -118,7 +118,63 @@ import org.apa

[GitHub] [spark] beliefer commented on a change in pull request #27428: [SPARK-30276][SQL] Support Filter expression allows simultaneous use of DISTINCT

2020-07-03 Thread GitBox
beliefer commented on a change in pull request #27428: URL: https://github.com/apache/spark/pull/27428#discussion_r449360982 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala ## @@ -118,7 +118,63 @@ import org.apa

[GitHub] [spark] AngersZhuuuu commented on pull request #28860: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-07-03 Thread GitBox
AngersZh commented on pull request #28860: URL: https://github.com/apache/spark/pull/28860#issuecomment-653718537 > org.apache.spark.sql.execution.ProjectionOverSchema.scala. A Scala extractor that projects an expression over a given schema. I think this code also needs to have matchin

[GitHub] [spark] imback82 commented on pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-03 Thread GitBox
imback82 commented on pull request #28676: URL: https://github.com/apache/spark/pull/28676#issuecomment-653710418 > It's kind of replacing one or more columns from the streaming side partitioning with the corresponding join keys, and get all the combinations. Makes sense. Thanks for

[GitHub] [spark] imback82 commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-03 Thread GitBox
imback82 commented on a change in pull request #28676: URL: https://github.com/apache/spark/pull/28676#discussion_r449730894 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala ## @@ -60,6 +60,26 @@ case class BroadcastHashJoi

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
HeartSaVioR commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449729609 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ cla

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
HeartSaVioR commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449729609 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ cla

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
HeartSaVioR commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449729609 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ cla

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
HeartSaVioR commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449729309 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ cla

[GitHub] [spark] github-actions[bot] closed pull request #27976: [SPARK-31213][SQL] Arrange the sequence of the config of Spark SQL.

2020-07-03 Thread GitBox
github-actions[bot] closed pull request #27976: URL: https://github.com/apache/spark/pull/27976 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] erikerlandson edited a comment on pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
erikerlandson edited a comment on pull request #28983: URL: https://github.com/apache/spark/pull/28983#issuecomment-653697145 when trying to refer to either ScalaAggregator or Aggregator over in catalyst, I'm running into some scoping problems, which are all similar to: ```scala [erro

[GitHub] [spark] erikerlandson commented on pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
erikerlandson commented on pull request #28983: URL: https://github.com/apache/spark/pull/28983#issuecomment-653697145 when trying to refer to either ScalaAggregator or Aggregator over in catalyst, I'm running into some scoping problems, which are all similar to: ```scala [error] /ho

[GitHub] [spark] maropu commented on a change in pull request #28808: [SPARK-31975][SQL] Throw user facing error when use WindowFunction directly

2020-07-03 Thread GitBox
maropu commented on a change in pull request #28808: URL: https://github.com/apache/spark/pull/28808#discussion_r449714381 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -158,6 +158,11 @@ trait CheckAnalysis extends P

[GitHub] [spark] maropu commented on a change in pull request #28804: [SPARK-31973][SQL] Add ability to disable Sort,Spill in Partial aggregation

2020-07-03 Thread GitBox
maropu commented on a change in pull request #28804: URL: https://github.com/apache/spark/pull/28804#discussion_r447512443 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2196,6 +2196,13 @@ object SQLConf { .checkValue(bit =>

[GitHub] [spark] maropu commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
maropu commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449712832 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -349,6 +357,17 @@ pri

[GitHub] [spark] maropu commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
maropu commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449712600 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -349,6 +357,17 @@ pri

[GitHub] [spark] viirya commented on a change in pull request #28962: [SPARK-32136][SQL] NormalizeFloatingNumbers should work on null struct

2020-07-03 Thread GitBox
viirya commented on a change in pull request #28962: URL: https://github.com/apache/spark/pull/28962#discussion_r449712074 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala ## @@ -123,7 +123,8 @@ object NormalizeFl

[GitHub] [spark] juliuszsompolski commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
juliuszsompolski commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449595381 ## File path: sql/hive-thriftserver/v1.2/src/main/java/org/apache/hive/service/cli/OperationState.java ## @@ -32,7 +32,8 @@ CLOSED(TOperation

[GitHub] [spark] rdblue commented on pull request #28993: [SPARK-32168][SQL] Fix hidden partitioning correctness bug in SQL overwrite

2020-07-03 Thread GitBox
rdblue commented on pull request #28993: URL: https://github.com/apache/spark/pull/28993#issuecomment-653675926 @cloud-fan, @brkyvz, @aokolnychyi, @dbtsai, @dongjoon-hyun, you may be interested in this PR. This fixes a correctness bug in SQL INSERT INTO with v2 tables. It only affects hidd

[GitHub] [spark] rdblue opened a new pull request #28993: [SPARK-32168][SQL] Fix hidden partitioning correctness bug in SQL overwrite

2020-07-03 Thread GitBox
rdblue opened a new pull request #28993: URL: https://github.com/apache/spark/pull/28993 ### What changes were proposed in this pull request? When converting an `INSERT OVERWRITE` query to a v2 overwrite plan, Spark attempts to detect when a dynamic overwrite and a static overwrite w

[GitHub] [spark] viirya commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
viirya commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449689477 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ class Sy

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
HeartSaVioR commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449685443 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ cla

[GitHub] [spark] dongjoon-hyun commented on pull request #28931: [SPARK-32103][YARN] Handle IPv6 host/port split in YarnRMClient

2020-07-03 Thread GitBox
dongjoon-hyun commented on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-653629276 GitHub Action CI is okay, but UCB AmbLab Jenkins CI seems to be completely down now. In this case, there is no way for us to do. Let's wait for a while. -

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF

2020-07-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #25024: URL: https://github.com/apache/spark/pull/25024#discussion_r449664743 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala ## @@ -450,3 +454,63 @@ case class ScalaUDAF( overri

[GitHub] [spark] erikerlandson commented on pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
erikerlandson commented on pull request #28983: URL: https://github.com/apache/spark/pull/28983#issuecomment-653606349 @cloud-fan thanks, I will try adding such a rule for ScalaAggregator This is an automated message from the

[GitHub] [spark] wangyum commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
wangyum commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449639878 ## File path: sql/hive-thriftserver/v1.2/src/main/java/org/apache/hive/service/cli/OperationState.java ## @@ -32,7 +32,8 @@ CLOSED(TOperationState.CLO

[GitHub] [spark] viirya commented on pull request #28979: [SPARK-32154][SQL] Use ExpressionEncoder for the return type of ScalaUDF to convert to catalyst type

2020-07-03 Thread GitBox
viirya commented on pull request #28979: URL: https://github.com/apache/spark/pull/28979#issuecomment-653601544 I see. Make sense to me. This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] xuanyuanking commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
xuanyuanking commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449638973 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ cl

[GitHub] [spark] viirya commented on a change in pull request #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF

2020-07-03 Thread GitBox
viirya commented on a change in pull request #25024: URL: https://github.com/apache/spark/pull/25024#discussion_r449638696 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala ## @@ -450,3 +454,63 @@ case class ScalaUDAF( override def

[GitHub] [spark] karuppayya commented on pull request #28715: [SPARK-31897][SQL] Enable codegen for GenerateExec

2020-07-03 Thread GitBox
karuppayya commented on pull request #28715: URL: https://github.com/apache/spark/pull/28715#issuecomment-653599420 @maropu @cloud-fan @HyukjinKwon Do you have any other comments for the PR? What else can I do in it to be merged? --

[GitHub] [spark] juliuszsompolski commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
juliuszsompolski commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449595381 ## File path: sql/hive-thriftserver/v1.2/src/main/java/org/apache/hive/service/cli/OperationState.java ## @@ -32,7 +32,8 @@ CLOSED(TOperation

[GitHub] [spark] wangyum commented on pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
wangyum commented on pull request #28991: URL: https://github.com/apache/spark/pull/28991#issuecomment-653592436 ok to test. This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] wangyum removed a comment on pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
wangyum removed a comment on pull request #28991: URL: https://github.com/apache/spark/pull/28991#issuecomment-653456264 ok to test. This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28983: URL: https://github.com/apache/spark/pull/28983#issuecomment-653579714 > does that avoid having to resolve these UnresolvedMapObject? on the executor side? Yes. Encoder is a container of expression. If the expression is resolved, then when

[GitHub] [spark] yaooqinn commented on pull request #28963: [SPARK-32145][SQL][test-hive1.2] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
yaooqinn commented on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653555222 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] PavithraRamachandran commented on pull request #28931: [SPARK-32103][YARN] Handle IPv6 host/port split in YarnRMClient

2020-07-03 Thread GitBox
PavithraRamachandran commented on pull request #28931: URL: https://github.com/apache/spark/pull/28931#issuecomment-653555033 @dongjoon-hyun i am getting 502 proxy error when i try to access the CI to check for failures. Could u help me? ---

[GitHub] [spark] guiyanakuang commented on pull request #28860: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-07-03 Thread GitBox
guiyanakuang commented on pull request #28860: URL: https://github.com/apache/spark/pull/28860#issuecomment-653544529 org.apache.spark.sql.execution.ProjectionOverSchema.scala. A Scala extractor that projects an expression over a given schema. I think this code also needs to have matching

[GitHub] [spark] guiyanakuang removed a comment on pull request #28860: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-07-03 Thread GitBox
guiyanakuang removed a comment on pull request #28860: URL: https://github.com/apache/spark/pull/28860#issuecomment-653541220 A Scala extractor that projects an expression over a given schema. This is an automated message fro

[GitHub] [spark] guiyanakuang commented on pull request #28860: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-07-03 Thread GitBox
guiyanakuang commented on pull request #28860: URL: https://github.com/apache/spark/pull/28860#issuecomment-653541220 A Scala extractor that projects an expression over a given schema. This is an automated message from the Ap

[GitHub] [spark] guiyanakuang commented on pull request #28860: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-07-03 Thread GitBox
guiyanakuang commented on pull request #28860: URL: https://github.com/apache/spark/pull/28860#issuecomment-653540953 ProjectionOverSchema.scala. A Scala extractor that projects an expression over a given schema.I think ProjectionOverSchema.scala must add matching rules as well. ---

[GitHub] [spark] guiyanakuang removed a comment on pull request #28860: [SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct

2020-07-03 Thread GitBox
guiyanakuang removed a comment on pull request #28860: URL: https://github.com/apache/spark/pull/28860#issuecomment-653540953 ProjectionOverSchema.scala. A Scala extractor that projects an expression over a given schema.I think ProjectionOverSchema.scala must add matching rules as well

[GitHub] [spark] manuzhang commented on pull request #28954: [SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE

2020-07-03 Thread GitBox
manuzhang commented on pull request #28954: URL: https://github.com/apache/spark/pull/28954#issuecomment-653540634 @JkSelf I get the same result as you when I set the `numPartitions` of source to 1 (By default, it's 16 on my Mac), i.e. ```scala val df1 = spark.range(0, 10, 1,

[GitHub] [spark] erikerlandson commented on pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
erikerlandson commented on pull request #28983: URL: https://github.com/apache/spark/pull/28983#issuecomment-653540515 > We can follow ResolveEncodersInUDF: add a rule to resolve the encoders in ScalaAggregator at driver side. @cloud-fan if we do this, and resolve these on the driver

[GitHub] [spark] erikerlandson commented on a change in pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
erikerlandson commented on a change in pull request #28983: URL: https://github.com/apache/spark/pull/28983#discussion_r449569671 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -679,7 +679,10 @@ object MapObjects

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-07-03 Thread GitBox
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r449563944 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ## @@ -39,7 +39,18 @@ object SimplifyExtractValueO

[GitHub] [spark] manuzhang edited a comment on pull request #28954: [SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE

2020-07-03 Thread GitBox
manuzhang edited a comment on pull request #28954: URL: https://github.com/apache/spark/pull/28954#issuecomment-653529076 @JkSelf @viirya Here is the partial SQL UI of running the same example with default number of shuffle partitions. (the binary is built from master branch till June 2

[GitHub] [spark] manuzhang commented on pull request #28954: [SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE

2020-07-03 Thread GitBox
manuzhang commented on pull request #28954: URL: https://github.com/apache/spark/pull/28954#issuecomment-653529076 @JkSelf @viirya Here is the partial SQL UI of running the same example with default number of shuffle partitions. (the binary is built from master branch till June 29). Yo

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28808: [SPARK-31975][SQL] Throw user facing error when use WindowFunction directly

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28808: URL: https://github.com/apache/spark/pull/28808#discussion_r449551076 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ## @@ -884,4 +884,15 @@ class AnalysisSuite exte

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28957: [WIP][SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28957: URL: https://github.com/apache/spark/pull/28957#discussion_r449544111 ## File path: python/pyspark/sql/dataframe.py ## @@ -2356,16 +2326,16 @@ def _test(): globs['df'] = sc.parallelize([(2, 'Alice'), (5, 'Bob')])\

[GitHub] [spark] cloud-fan commented on pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653508864 There are user-facing errors and unexpected internal errors. For example, if a user makes a mistake in the SQL query, we should tell him what the error is (like table not foun

[GitHub] [spark] cloud-fan commented on pull request #28808: [SPARK-31975][SQL] Throw user facing error when use WindowFunction directly

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28808: URL: https://github.com/apache/spark/pull/28808#issuecomment-653506469 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28986: [SPARK-32160][CORE][PYSPARK] Disallow to create SparkContext in executors.

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28986: URL: https://github.com/apache/spark/pull/28986#discussion_r449536593 ## File path: python/pyspark/tests/test_context.py ## @@ -267,6 +267,13 @@ def test_resources(self): resources = sc.resources

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28986: [SPARK-32160][CORE][PYSPARK] Disallow to create SparkContext in executors.

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28986: URL: https://github.com/apache/spark/pull/28986#discussion_r449535946 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -2554,6 +2557,19 @@ object SparkContext extends Logging { } } +

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28986: [SPARK-32160][CORE][PYSPARK] Disallow to create SparkContext in executors.

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28986: URL: https://github.com/apache/spark/pull/28986#discussion_r449535492 ## File path: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ## @@ -934,6 +934,18 @@ class SparkContextSuite extends SparkFunSuite with

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28986: [SPARK-32160][CORE][PYSPARK] Disallow to create SparkContext in executors.

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28986: URL: https://github.com/apache/spark/pull/28986#discussion_r449535087 ## File path: core/src/main/scala/org/apache/spark/SparkContext.scala ## @@ -2554,6 +2557,19 @@ object SparkContext extends Logging { } } +

[GitHub] [spark] attilapiros commented on pull request #28967: [SPARK-32149][SHUFFLE] Improve file path name normalisation at block resolution within the external shuffle service

2020-07-03 Thread GitBox
attilapiros commented on pull request #28967: URL: https://github.com/apache/spark/pull/28967#issuecomment-653494855 retest this please This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] ulysses-you commented on a change in pull request #28808: [SPARK-31975][SQL] Throw user facing error when use WindowFunction directly

2020-07-03 Thread GitBox
ulysses-you commented on a change in pull request #28808: URL: https://github.com/apache/spark/pull/28808#discussion_r449516560 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -158,6 +158,9 @@ trait CheckAnalysis exten

[GitHub] [spark] juliuszsompolski commented on pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
juliuszsompolski commented on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653478528 I would be +1 for printing as much of exception as possible. We often get support tickets from end users without an indication of the timestamp when it happened, and

[GitHub] [spark] MaxGekk commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-03 Thread GitBox
MaxGekk commented on pull request #27366: URL: https://github.com/apache/spark/pull/27366#issuecomment-653478426 jenkins, retest this, please This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [spark] wangyum commented on pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
wangyum commented on pull request #28991: URL: https://github.com/apache/spark/pull/28991#issuecomment-653456264 ok to test. This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28957: [WIP][SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28957: URL: https://github.com/apache/spark/pull/28957#discussion_r449485344 ## File path: python/pyspark/sql/types.py ## @@ -1487,36 +1451,14 @@ class Row(tuple): True """ -# Remove after Python < 3.6 dropped,

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28957: [WIP][SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28957: URL: https://github.com/apache/spark/pull/28957#discussion_r449475633 ## File path: python/pyspark/sql/types.py ## @@ -1487,36 +1451,14 @@ class Row(tuple): True """ -# Remove after Python < 3.6 dropped,

[GitHub] [spark] leoluan2009 commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
leoluan2009 commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449478873 ## File path: sql/hive-thriftserver/v1.2/src/main/java/org/apache/hive/service/cli/OperationState.java ## @@ -32,7 +32,8 @@ CLOSED(TOperationState

[GitHub] [spark] leoluan2009 commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
leoluan2009 commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449477961 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -349,6 +357,17 @

[GitHub] [spark] leoluan2009 commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
leoluan2009 commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449476585 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -204,6 +205,13 @

[GitHub] [spark] leoluan2009 commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
leoluan2009 commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449475945 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ## @@ -874,6 +874,16 @@ class

[GitHub] [spark] leoluan2009 commented on a change in pull request #28991: [SPARK-26533][SQL][test-hive1.2] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
leoluan2009 commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449475753 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -349,6 +357,17 @

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28957: [WIP][SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28957: URL: https://github.com/apache/spark/pull/28957#discussion_r449475633 ## File path: python/pyspark/sql/types.py ## @@ -1487,36 +1451,14 @@ class Row(tuple): True """ -# Remove after Python < 3.6 dropped,

[GitHub] [spark] cloud-fan commented on pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653443008 SGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] cloud-fan commented on pull request #28979: [SPARK-32154][SQL] Use ExpressionEncoder for the return type of ScalaUDF to convert to catalyst type

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28979: URL: https://github.com/apache/spark/pull/28979#issuecomment-653442152 > UDF is not controlled by spark.sql.datetime.java8API.enabled UDF shouldn't be. The config is only applicable if users want to get data from Spark, and this config deci

[GitHub] [spark] yaooqinn edited a comment on pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
yaooqinn edited a comment on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653442176 The thing is that the actual exception here is hard to define as we capture all `Throwable`s here. Or we can add a configuration that is not to decide which exception

[GitHub] [spark] yaooqinn commented on pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
yaooqinn commented on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653442176 The thing is that the actual exception here is hard to define as we capture all `Throwable`s here. Or we can add a configuration that is not to decide which exception to deal

[GitHub] [spark] cloud-fan commented on a change in pull request #28683: [SPARK-31875][SQL] Provide a option to disable user supplied Hints

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28683: URL: https://github.com/apache/spark/pull/28683#discussion_r449471684 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ## @@ -3521,6 +3521,45 @@ class SQLQuerySuite extends QueryTest with Sha

[GitHub] [spark] cloud-fan commented on a change in pull request #28683: [SPARK-31875][SQL] Provide a option to disable user supplied Hints

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28683: URL: https://github.com/apache/spark/pull/28683#discussion_r449471485 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -3183,6 +3192,8 @@ class SQLConf extends Serializable with

[GitHub] [spark] cloud-fan commented on pull request #27983: [SPARK-32105][SQL]Refactor current ScriptTransformationExec code

2020-07-03 Thread GitBox
cloud-fan commented on pull request #27983: URL: https://github.com/apache/spark/pull/27983#issuecomment-653439971 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a change in pull request #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #25024: URL: https://github.com/apache/spark/pull/25024#discussion_r449467645 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala ## @@ -450,3 +454,63 @@ case class ScalaUDAF( override d

[GitHub] [spark] cloud-fan commented on a change in pull request #28983: [SPARK-32159][SQL] Fix integration between Aggregator[Array[_], _, _] and UnresolvedMapObjects

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28983: URL: https://github.com/apache/spark/pull/28983#discussion_r449465235 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -679,7 +679,10 @@ object MapObjects {

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28256: [SPARK-31483][PySpark] Use SPARK_PYTHON or 'python' to run find_spark_home.py

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28256: URL: https://github.com/apache/spark/pull/28256#discussion_r449464441 ## File path: bin/find-spark-home ## @@ -34,8 +34,5 @@ elif [ ! -f "$FIND_SPARK_HOME_PYTHON_SCRIPT" ]; then else # We are pip installed, use the P

[GitHub] [spark] cloud-fan commented on a change in pull request #28992: [SPARK-32167][SQL] fix nullability of GetArrayStructFields

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28992: URL: https://github.com/apache/spark/pull/28992#discussion_r449461650 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ComplexTypesSuite.scala ## @@ -106,4 +110,11 @@ class ComplexTypesSuite extends QueryTest wit

[GitHub] [spark] cloud-fan commented on pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28963: URL: https://github.com/apache/spark/pull/28963#issuecomment-653431032 Personally I'd like to see stacktraces as I'm a developer. But I agree with @LantaoJin that it may not be friendly to end-users. If the thriftserver never shows stacktraces so

[GitHub] [spark] rednaxelafx commented on a change in pull request #28992: [SPARK-32167][SQL] fix nullability of GetArrayStructFields

2020-07-03 Thread GitBox
rednaxelafx commented on a change in pull request #28992: URL: https://github.com/apache/spark/pull/28992#discussion_r449459077 ## File path: sql/core/src/test/scala/org/apache/spark/sql/ComplexTypesSuite.scala ## @@ -106,4 +110,11 @@ class ComplexTypesSuite extends QueryTest w

[GitHub] [spark] cloud-fan commented on a change in pull request #28975: [SPARK-32148][SS] Fix stream-stream join issue on missing to copy reused unsafe row

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28975: URL: https://github.com/apache/spark/pull/28975#discussion_r449458290 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala ## @@ -451,10 +451,25 @@ class

[GitHub] [spark] maropu commented on a change in pull request #28991: [SPARK-26533][SQL] Support query auto timeout cancel on thriftserver

2020-07-03 Thread GitBox
maropu commented on a change in pull request #28991: URL: https://github.com/apache/spark/pull/28991#discussion_r449442618 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala ## @@ -874,6 +874,16 @@ class Hive

[GitHub] [spark] cloud-fan commented on a change in pull request #28926: [SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28926: URL: https://github.com/apache/spark/pull/28926#discussion_r449456120 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -2674,11 +2679,24 @@ object Sequen

[GitHub] [spark] cloud-fan commented on pull request #28926: [SPARK-32133][SQL] Forbid time field steps for date start/end in Sequence

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28926: URL: https://github.com/apache/spark/pull/28926#issuecomment-653427435 retest this please This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a change in pull request #28808: [SPARK-31975][SQL] Throw user facing error when use WindowFunction directly

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28808: URL: https://github.com/apache/spark/pull/28808#discussion_r449454344 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -158,6 +158,9 @@ trait CheckAnalysis extends

[GitHub] [spark] cloud-fan commented on pull request #28992: [SPARK-32167][SQL] fix nullability of GetArrayStructFields

2020-07-03 Thread GitBox
cloud-fan commented on pull request #28992: URL: https://github.com/apache/spark/pull/28992#issuecomment-653424050 cc @maropu @viirya @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan opened a new pull request #28992: [SPARK-32167][SQL] fix nullability of GetArrayStructFields

2020-07-03 Thread GitBox
cloud-fan opened a new pull request #28992: URL: https://github.com/apache/spark/pull/28992 ### What changes were proposed in this pull request? Fix nullability of `GetArrayStructFields`. It should consider both the original array's `containsNull` and the inner field's nullab

[GitHub] [spark] dbtsai commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

2020-07-03 Thread GitBox
dbtsai commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r449426612 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ## @@ -39,7 +39,18 @@ object SimplifyExtractValueOps

[GitHub] [spark] yaooqinn commented on a change in pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
yaooqinn commented on a change in pull request #28963: URL: https://github.com/apache/spark/pull/28963#discussion_r449423638 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -315,16 +311,11 @@

[GitHub] [spark] yaooqinn commented on a change in pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
yaooqinn commented on a change in pull request #28963: URL: https://github.com/apache/spark/pull/28963#discussion_r449421416 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -315,16 +311,11 @@

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28957: [WIP][SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28957: URL: https://github.com/apache/spark/pull/28957#discussion_r449420723 ## File path: docs/rdd-programming-guide.md ## @@ -276,7 +276,7 @@ $ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook ./bin/pyspar

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28957: [WIP][SPARK-32138] Drop Python 2.7, 3.4 and 3.5

2020-07-03 Thread GitBox
HyukjinKwon commented on a change in pull request #28957: URL: https://github.com/apache/spark/pull/28957#discussion_r449420977 ## File path: docs/rdd-programming-guide.md ## @@ -276,7 +276,7 @@ $ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook ./bin/pyspar

[GitHub] [spark] cloud-fan commented on a change in pull request #28963: [SPARK-32145][SQL] ThriftCLIService.GetOperationStatus should include exception's stack trace to the error message

2020-07-03 Thread GitBox
cloud-fan commented on a change in pull request #28963: URL: https://github.com/apache/spark/pull/28963#discussion_r449419697 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -315,16 +311,11 @@

[GitHub] [spark] Ngone51 edited a comment on pull request #28629: [SPARK-31769][CORE] Add MDC support for driver threads

2020-07-03 Thread GitBox
Ngone51 edited a comment on pull request #28629: URL: https://github.com/apache/spark/pull/28629#issuecomment-653385789 I'm fine to remove the prefix if we want to inherit the MDC properties directly since I agree API consistent is more important. And I think we need to document it clearly

[GitHub] [spark] Ngone51 edited a comment on pull request #28629: [SPARK-31769][CORE] Add MDC support for driver threads

2020-07-03 Thread GitBox
Ngone51 edited a comment on pull request #28629: URL: https://github.com/apache/spark/pull/28629#issuecomment-653385789 I'm fine to remove the prefix if we want to inherit the MDC properties directly since I agree API consistent is more important. And I think we need to document it clearly