[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625513585 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] zhangrenhua commented on a change in pull request #32222: [SPARK-35126][SQL] Execute jdbc cancellation method when jdbc load job is interrupted

2021-05-03 Thread GitBox
zhangrenhua commented on a change in pull request #3: URL: https://github.com/apache/spark/pull/3#discussion_r625512554 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala ## @@ -301,10 +306,44 @@ private[jdbc] class

[GitHub] [spark] jerqi closed pull request #32426: [SPARK-35297][DOC] modify the comment about the executor

2021-05-03 Thread GitBox
jerqi closed pull request #32426: URL: https://github.com/apache/spark/pull/32426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] gengliangwang commented on a change in pull request #32421: [SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated files under optimizer

2021-05-03 Thread GitBox
gengliangwang commented on a change in pull request #32421: URL: https://github.com/apache/spark/pull/32421#discussion_r625505942 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala ## @@ -93,4 +94,12 @@ case class

[GitHub] [spark] jerqi closed pull request #32426: [SPARK-35297][DOC] modify the annotation about the executor

2021-05-03 Thread GitBox
jerqi closed pull request #32426: URL: https://github.com/apache/spark/pull/32426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
HyukjinKwon commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625503624 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] HyukjinKwon commented on pull request #32429: [SPARK-35303][PYTHON] Enable pinned thread mode by default

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32429: URL: https://github.com/apache/spark/pull/32429#issuecomment-831674289 There are couple of todos such as updating migration guide so I marked it as a draft. I will take a look more and see if there are potential side effects to warn users.

[GitHub] [spark] HyukjinKwon opened a new pull request #32429: [SPARK-35303][PYTHON] Enable pinned thread mode by default

2021-05-03 Thread GitBox
HyukjinKwon opened a new pull request #32429: URL: https://github.com/apache/spark/pull/32429 ### What changes were proposed in this pull request? PySpark added pinned thread mode at https://github.com/apache/spark/pull/24898 to sync Python thread to JVM thread. Previously, one JVM

[GitHub] [spark] byungsoo commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
byungsoo commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625498869 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] HyukjinKwon commented on pull request #32428: [SPARK-35302][INFRA] Benchmark workflow should create new files for new benchmarks

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32428: URL: https://github.com/apache/spark/pull/32428#issuecomment-831670930 This is still being tested. I will make it ready-of-reivew when all tests pass properly. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon opened a new pull request #32428: [SPARK-35302][INFRA] Benchmark workflow should create new files for new benchmarks

2021-05-03 Thread GitBox
HyukjinKwon opened a new pull request #32428: URL: https://github.com/apache/spark/pull/32428 ### What changes were proposed in this pull request? Currently, it fails at `git diff --name-only` when new benchmarks are added, see

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
HyukjinKwon commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625492946 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] HyukjinKwon commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
HyukjinKwon commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625492422 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] Ngone51 commented on a change in pull request #32007: [SPARK-33350][SHUFFLE] Add support to DiskBlockManager to create merge directory and to get the local shuffle merged data

2021-05-03 Thread GitBox
Ngone51 commented on a change in pull request #32007: URL: https://github.com/apache/spark/pull/32007#discussion_r625477570 ## File path: core/src/main/scala/org/apache/spark/storage/BlockId.scala ## @@ -87,6 +87,29 @@ case class ShufflePushBlockId(shuffleId: Int, mapIndex:

[GitHub] [spark] viirya commented on pull request #32327: [SPARK-35211][PYTHON] Proper NumericType conversion for applySchemaToPythonRDD

2021-05-03 Thread GitBox
viirya commented on pull request #32327: URL: https://github.com/apache/spark/pull/32327#issuecomment-831646835 Hmm, first I am confused by why there are many PRs for the same JIRA? Are they for the same issue? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HyukjinKwon commented on pull request #32327: [SPARK-35211][PYTHON] Proper NumericType conversion for applySchemaToPythonRDD

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32327: URL: https://github.com/apache/spark/pull/32327#issuecomment-831644826 cc @ueshin @BryanCutler @viirya too FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #32327: [SPARK-35211][PYTHON] Proper NumericType conversion for applySchemaToPythonRDD

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32327: URL: https://github.com/apache/spark/pull/32327#issuecomment-831644758 I am okay with this change but you'll have to update https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L4922-L4940 as well by using the codes

[GitHub] [spark] HyukjinKwon closed pull request #32427: [SPARK-35300][PYTHON][DOCS] Standardize module names in install.rst

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32427: URL: https://github.com/apache/spark/pull/32427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32427: [SPARK-35300][PYTHON][DOCS] Standardize module names in install.rst

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32427: URL: https://github.com/apache/spark/pull/32427#issuecomment-831642310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For

[GitHub] [spark] HyukjinKwon commented on pull request #32420: [SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32420: URL: https://github.com/apache/spark/pull/32420#issuecomment-831639869 cc @wangyum too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xinrong-databricks opened a new pull request #32427: [SPARK-35300][DOC] Standardize module names in install.rst

2021-05-03 Thread GitBox
xinrong-databricks opened a new pull request #32427: URL: https://github.com/apache/spark/pull/32427 ### What changes were proposed in this pull request? Use full names of modules in `install.rst` when specifying dependencies. ### Why are the changes needed?

[GitHub] [spark] github-actions[bot] closed pull request #30716: [SPARK-33747][CORE] Avoid calling unregisterMapOutput when the map stage is being rerunning.

2021-05-03 Thread GitBox
github-actions[bot] closed pull request #30716: URL: https://github.com/apache/spark/pull/30716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [spark] github-actions[bot] closed pull request #30832: [SPARK-33489][PYSPARK][WIP]Add support for converting null from & to Arrow…

2021-05-03 Thread GitBox
github-actions[bot] closed pull request #30832: URL: https://github.com/apache/spark/pull/30832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this

[GitHub] [spark] HyukjinKwon closed pull request #32386: [SPARK-34887][PYTHON] Port Koalas dependencies into PySpark

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32386: URL: https://github.com/apache/spark/pull/32386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32386: [SPARK-34887][PYTHON] Port Koalas dependencies into PySpark

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32386: URL: https://github.com/apache/spark/pull/32386#issuecomment-831608883 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #32418: [SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32418: URL: https://github.com/apache/spark/pull/32418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32418: [SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32418: URL: https://github.com/apache/spark/pull/32418#issuecomment-831607915 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625437195 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r624110505 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] HyukjinKwon closed pull request #32423: [SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VALUE at CSV's unescapedQuoteHandling option documentation

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32423: URL: https://github.com/apache/spark/pull/32423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32423: [SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VALUE at CSV's unescapedQuoteHandling option documentation

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32423: URL: https://github.com/apache/spark/pull/32423#issuecomment-831602653 Thanks guys! Merged to master and branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] maropu commented on pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

2021-05-03 Thread GitBox
maropu commented on pull request #32424: URL: https://github.com/apache/spark/pull/32424#issuecomment-831601631 cc: @HyukjinKwon @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r625412638 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] viirya commented on a change in pull request #32422: [DO-NOT-MERGE][WIP][SS] Custom stateful task scheduling

2021-05-03 Thread GitBox
viirya commented on a change in pull request #32422: URL: https://github.com/apache/spark/pull/32422#discussion_r625401105 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchedulingPlugin.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831545290 Right yeah like if there is a comparable implementation in R or sklearn, and it gives a certain answer, that's decent evidence that it's more correct. Could be due to different

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831542712 > It's entirely possible that 93.3 is a more correct log-likelihood. Usually we check some other implementation if possible to verify. "Other implementation" as in

[GitHub] [spark] viirya commented on a change in pull request #32413: [SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

2021-05-03 Thread GitBox
viirya commented on a change in pull request #32413: URL: https://github.com/apache/spark/pull/32413#discussion_r625368691 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -236,7 +236,26 @@ case class

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831519563 It's entirely possible that 93.3 is a more correct log-likelihood. Usually we check some other implementation if possible to verify. -- This is an automated message from the

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831514938 The error is the following: ``` File "/__w/spark/spark/python/pyspark/ml/clustering.py", line 276, in __main__.GaussianMixture Failed example:

[GitHub] [spark] sunchao commented on a change in pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-03 Thread GitBox
sunchao commented on a change in pull request #32410: URL: https://github.com/apache/spark/pull/32410#discussion_r625335398 ## File path: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java ## @@ -141,7 +141,7 @@ public void open(Map

[GitHub] [spark] dongjoon-hyun commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-03 Thread GitBox
dongjoon-hyun commented on pull request #32410: URL: https://github.com/apache/spark/pull/32410#issuecomment-831485837 cc @sunchao , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] npoggi commented on pull request #32243: [SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from databricks/spark-sql-perf

2021-05-03 Thread GitBox
npoggi commented on pull request #32243: URL: https://github.com/apache/spark/pull/32243#issuecomment-831482901 Arriving a big late. Looks good. We should move the DDL for the tables as resource files at some point. Thanks for the update. -- This is an automated message from the

[GitHub] [spark] dongjoon-hyun commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-03 Thread GitBox
dongjoon-hyun commented on pull request #32410: URL: https://github.com/apache/spark/pull/32410#issuecomment-831480082 Got it. Thank you for the clarification, @wangyum . > This patch use setCurrentSessionState with all versions except Hive 0.12. -- This is an automated message from

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState

2021-05-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #32410: URL: https://github.com/apache/spark/pull/32410#discussion_r625313108 ## File path: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala ## @@ -28,8

[GitHub] [spark] AmplabJenkins commented on pull request #32426: [SPARK-35297][DOC] modify the annotation about the executor

2021-05-03 Thread GitBox
AmplabJenkins commented on pull request #32426: URL: https://github.com/apache/spark/pull/32426#issuecomment-831469400 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] jerqi opened a new pull request #32426: [SPARK-35297][DOC] modify the annotation about the executor

2021-05-03 Thread GitBox
jerqi opened a new pull request #32426: URL: https://github.com/apache/spark/pull/32426 ### What changes were proposed in this pull request? Now Spark Executor already can be used in Kubernetes scheduler. So we should modify the annotation in the Executor.scala. ### Why are the

[GitHub] [spark] otterc commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
otterc commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r625287419 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -245,9 +253,21 @@ final class

[GitHub] [spark] xinrong-databricks commented on a change in pull request #32386: [SPARK-34887][PYTHON] Port Koalas dependencies into PySpark

2021-05-03 Thread GitBox
xinrong-databricks commented on a change in pull request #32386: URL: https://github.com/apache/spark/pull/32386#discussion_r625221612 ## File path: python/docs/source/getting_started/install.rst ## @@ -159,6 +159,9 @@ Package Minimum supported version Note `NumPy`

[GitHub] [spark] sigmod opened a new pull request #32425: [WIP][SPARK-35155][SQL] Add rule id pruning to Resolve rules

2021-05-03 Thread GitBox
sigmod opened a new pull request #32425: URL: https://github.com/apache/spark/pull/32425 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL][WIP]In CTAS with LOCATION , should not allow to a non-empty directory.

2021-05-03 Thread GitBox
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r625213745 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ## @@ -166,6 +166,10 @@ case class

[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r625164945 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -245,9 +253,21 @@ final class

[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r625163412 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -708,6 +785,15 @@ final class

[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r625157041 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -317,6 +377,14 @@ final class

[GitHub] [spark] Ngone51 commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
Ngone51 commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r625153377 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -146,6 +148,12 @@ final class

[GitHub] [spark] maropu opened a new pull request #32424: [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

2021-05-03 Thread GitBox
maropu opened a new pull request #32424: URL: https://github.com/apache/spark/pull/32424 ### What changes were proposed in this pull request? To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables`

[GitHub] [spark] hezuojiao commented on pull request #32417: [SPARK-35289][SQL] Make CatalogString contains nullable information

2021-05-03 Thread GitBox
hezuojiao commented on pull request #32417: URL: https://github.com/apache/spark/pull/32417#issuecomment-831319031 I plan to fix these failed unit tests later. Any comments on this PR? I think it needs more discussion to help it complete. -- This is an automated message from the

[GitHub] [spark] sigmod commented on pull request #32421: [SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated files under optimizer

2021-05-03 Thread GitBox
sigmod commented on pull request #32421: URL: https://github.com/apache/spark/pull/32421#issuecomment-831306776 @hvanhovell @gengliangwang @dbaliafroozeh @maryannxue this PR is ready for review. Let me know if you have any questions. -- This is an automated message from the Apache Git

[GitHub] [spark] srowen commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831281894 Jenkins retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on a change in pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
srowen commented on a change in pull request #32415: URL: https://github.com/apache/spark/pull/32415#discussion_r625107651 ## File path: LICENSE-binary ## @@ -456,7 +456,6 @@ org.antlr:ST4 org.antlr:stringtemplate org.antlr:antlr4-runtime antlr:antlr

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32418: [SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration

2021-05-03 Thread GitBox
AmplabJenkins removed a comment on pull request #32418: URL: https://github.com/apache/spark/pull/32418#issuecomment-830871723 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] srowen commented on pull request #32418: [SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration

2021-05-03 Thread GitBox
srowen commented on pull request #32418: URL: https://github.com/apache/spark/pull/32418#issuecomment-831273093 Jenkins retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] maropu commented on pull request #32420: [SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on pull request #32420: URL: https://github.com/apache/spark/pull/32420#issuecomment-831245700 The failures in GA are not related to this PR. cc: @HyukjinKwon @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] luhenry commented on pull request #32415: [SPARK-35295][ML] Replace fully com.github.fommil.netlib by dev.ludovic.netlib:2.0

2021-05-03 Thread GitBox
luhenry commented on pull request #32415: URL: https://github.com/apache/spark/pull/32415#issuecomment-831212439 /cc @srowen I have release `dev.ludovic.netlib:2.0.0` and I've updated this PR accordingly. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] garawalid commented on pull request #32418: [SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration

2021-05-03 Thread GitBox
garawalid commented on pull request #32418: URL: https://github.com/apache/spark/pull/32418#issuecomment-831200381 @HyukjinKwon, thanks for pointing this out, it's done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #32423: [SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VALUE at CSV's unescapedQuoteHandling option documentation

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32423: URL: https://github.com/apache/spark/pull/32423#issuecomment-831190677 cc @LuciferYang FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon opened a new pull request #32423: [SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VALUE at CSV's unescapedQuoteHandling option documentation

2021-05-03 Thread GitBox
HyukjinKwon opened a new pull request #32423: URL: https://github.com/apache/spark/pull/32423 ### What changes were proposed in this pull request? This is rather a followup of https://github.com/apache/spark/pull/30518 that should be ported back to `branch-3.1` too.

[GitHub] [spark] HyukjinKwon closed pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32400: URL: https://github.com/apache/spark/pull/32400 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831175026 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32394: URL: https://github.com/apache/spark/pull/32394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32394: URL: https://github.com/apache/spark/pull/32394#issuecomment-831128765 Thanks for your first contribution and congrats for being a contributor! -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32394: URL: https://github.com/apache/spark/pull/32394#issuecomment-831128152 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] byungsoo-oh commented on a change in pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
byungsoo-oh commented on a change in pull request #32394: URL: https://github.com/apache/spark/pull/32394#discussion_r624944873 ## File path: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ## @@ -49,7 +49,11 @@ abstract class BenchmarkBase { val

[GitHub] [spark] mridulm commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
mridulm commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r624910203 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -862,6 +970,25 @@ private class

[GitHub] [spark] sunchao commented on a change in pull request #32407: [SPARK-35261][SQL] Support static magic method for stateless ScalarFunction

2021-05-03 Thread GitBox
sunchao commented on a change in pull request #32407: URL: https://github.com/apache/spark/pull/32407#discussion_r624907772 ## File path: sql/core/benchmarks/FunctionBenchmark-jdk11-results.txt ## @@ -0,0 +1,32 @@ +OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Mac OS X 10.16

[GitHub] [spark] otterc commented on a change in pull request #32287: [SPARK-27991][CORE] Defer the fetch request on Netty OOM

2021-05-03 Thread GitBox
otterc commented on a change in pull request #32287: URL: https://github.com/apache/spark/pull/32287#discussion_r624904117 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -708,6 +785,15 @@ final class

[GitHub] [spark] byungsoo-oh commented on a change in pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
byungsoo-oh commented on a change in pull request #32394: URL: https://github.com/apache/spark/pull/32394#discussion_r624902959 ## File path: core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ## @@ -49,7 +49,11 @@ abstract class BenchmarkBase { val

[GitHub] [spark] byungsoo-oh commented on pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
byungsoo-oh commented on pull request #32394: URL: https://github.com/apache/spark/pull/32394#issuecomment-831057105 > @byungsoo-oh: > > 1. would you mind checking https://github.com/apache/spark/pull/32394/checks?check_run_id=2464430892 and enable GitHub Actions in your forked

[GitHub] [spark] viirya edited a comment on pull request #32422: [DO-NOT-MERGE][WIP][SS] Custom stateful task scheduling

2021-05-03 Thread GitBox
viirya edited a comment on pull request #32422: URL: https://github.com/apache/spark/pull/32422#issuecomment-831053905 `StateSchedulingPlugin` is for scheduling stateful tasks. `StreamingSymmetricHashJoinHelper` specifies `StateSchedulingPlugin` for custom scheduling. Other diffs

[GitHub] [spark] viirya commented on a change in pull request #32422: [DO-NOT-MERGE][WIP][SS] Custom stateful task scheduling

2021-05-03 Thread GitBox
viirya commented on a change in pull request #32422: URL: https://github.com/apache/spark/pull/32422#discussion_r624900295 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchedulingPlugin.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to

[GitHub] [spark] viirya commented on pull request #32422: [DO-NOT-MERGE][WIP][SS] Custom stateful task scheduling

2021-05-03 Thread GitBox
viirya commented on pull request #32422: URL: https://github.com/apache/spark/pull/32422#issuecomment-831053905 `StateSchedulingPlugin` is for scheduling stateful tasks. `StreamingSymmetricHashJoinHelper` specifies `StateSchedulingPlugin` for custom scheduling. -- This is an automated

[GitHub] [spark] Dobiasd commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
Dobiasd commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831053536 The changes are squashed into one commit now. :heavy_check_mark: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on pull request #32136: [SPARK-35022][CORE] Task Scheduling Plugin in Spark

2021-05-03 Thread GitBox
viirya commented on pull request #32136: URL: https://github.com/apache/spark/pull/32136#issuecomment-831053315 A reference implementation for custom stateful task scheduling is at #32422. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32394: [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

2021-05-03 Thread GitBox
AmplabJenkins removed a comment on pull request #32394: URL: https://github.com/apache/spark/pull/32394#issuecomment-829008876 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon closed pull request #32368: [SPARK-35176][PYTHON] Standardize input validation error type

2021-05-03 Thread GitBox
HyukjinKwon closed pull request #32368: URL: https://github.com/apache/spark/pull/32368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [spark] HyukjinKwon commented on pull request #32368: [SPARK-35176][PYTHON] Standardize input validation error type

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32368: URL: https://github.com/apache/spark/pull/32368#issuecomment-831053007 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] viirya opened a new pull request #32422: [DO-NOT-MERGE][WIP][SS] Custom stateful task scheduling

2021-05-03 Thread GitBox
viirya opened a new pull request #32422: URL: https://github.com/apache/spark/pull/32422 ### What changes were proposed in this pull request? This is a reference PR of custom stateful task scheduling for SPARK-35022: Task Scheduling Plugin in Spark. (#32136). ###

[GitHub] [spark] Dobiasd commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
Dobiasd commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831051189 Ah, when going directly to https://github.com/Dobiasd/spark/actions, it showed this: https://i.imgur.com/NA9PgLB.png I've clicked on "I understand my workflows, go ahead

[GitHub] [spark] HyukjinKwon commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831050620 Oh, I can see now: https://github.com/Dobiasd/spark/actions. Would you mind squahsing your comments and pushing it into this PR? e.g.) `git rebase upstream/master -i`, mark

[GitHub] [spark] HyukjinKwon commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831049831 Hm, but the tests should pass though before merging it in. Do other repositories of yours run GitHub Actions workflows properly? -- This is an automated message from the

[GitHub] [spark] HyukjinKwon edited a comment on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon edited a comment on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831048626 Thanks @Dobiasd. It's weird that your forked repository doesn't run the tests at https://github.com/Dobiasd/spark/actions .. your branch looks sync'ed with the

[GitHub] [spark] HyukjinKwon commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831048626 Thanks @Dobiasd. Your forked repository doesn't run the tests at https://github.com/Dobiasd/spark/actions .. your branch looks sync'ed with the latest Apache Spark `master`

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
AmplabJenkins removed a comment on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-829296322 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
HyukjinKwon commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831046855 ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] maropu commented on a change in pull request #32420: [WIP][SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on a change in pull request #32420: URL: https://github.com/apache/spark/pull/32420#discussion_r624890341 ## File path: sql/core/src/test/resources/tpcds-query-results/v1_4/q23a.sql.out ## @@ -3,4 +3,4 @@ -- !query schema struct -- !query output -17030.91

[GitHub] [spark] maropu commented on a change in pull request #32420: [WIP][SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on a change in pull request #32420: URL: https://github.com/apache/spark/pull/32420#discussion_r624890670 ## File path: sql/core/src/test/resources/tpcds-query-results/v1_4/q23b.sql.out ## @@ -3,7 +3,4 @@ -- !query schema struct -- !query output -NULL

[GitHub] [spark] maropu commented on a change in pull request #32420: [WIP][SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on a change in pull request #32420: URL: https://github.com/apache/spark/pull/32420#discussion_r624893120 ## File path: sql/core/src/test/resources/tpcds-query-results/v1_4/q91.sql.out ## @@ -3,4 +3,4 @@ -- !query schema struct -- !query output

[GitHub] [spark] Dobiasd commented on pull request #32400: [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

2021-05-03 Thread GitBox
Dobiasd commented on pull request #32400: URL: https://github.com/apache/spark/pull/32400#issuecomment-831045670 @HyukjinKwon Thanks for the help. :+1: - GitHub actions already was enabled for my forked repo: https://i.imgur.com/CTEkw1R.png - I've rebased my branch on upstream

[GitHub] [spark] maropu commented on a change in pull request #32420: [WIP][SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on a change in pull request #32420: URL: https://github.com/apache/spark/pull/32420#discussion_r624892918 ## File path: sql/core/src/test/resources/tpcds-query-results/v1_4/q54.sql.out ## @@ -3,4 +3,4 @@ -- !query schema struct -- !query output -11860 1

[GitHub] [spark] maropu commented on a change in pull request #32420: [WIP][SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on a change in pull request #32420: URL: https://github.com/apache/spark/pull/32420#discussion_r624892682 ## File path: sql/core/src/test/resources/tpcds-query-results/v1_4/q25.sql.out ## @@ -3,4 +3,4 @@ -- !query schema struct -- !query output

[GitHub] [spark] maropu commented on a change in pull request #32420: [WIP][SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

2021-05-03 Thread GitBox
maropu commented on a change in pull request #32420: URL: https://github.com/apache/spark/pull/32420#discussion_r624892545 ## File path: sql/core/src/test/resources/tpcds-query-results/v1_4/q24b.sql.out ## @@ -3,4 +3,4 @@ -- !query schema struct -- !query output -Griffith

  1   2   >