[GitHub] [spark] AmplabJenkins commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918753385 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47732/ --

[GitHub] [spark] SparkQA commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
SparkQA commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918753354 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47732/ -- This is an automated message from the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #31830: URL: https://github.com/apache/spark/pull/31830#issuecomment-918751955 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47733/

[GitHub] [spark] LuciferYang edited a comment on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
LuciferYang edited a comment on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918751290 > IOUtils.closeQuietly can also receive consumer to define the behavior around IOException. (That said, JavaUtils.closeQuietly can be reimplemented as well

[GitHub] [spark] SparkQA commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
SparkQA commented on pull request #31830: URL: https://github.com/apache/spark/pull/31830#issuecomment-918751934 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47733/ -- This is an automated message from the

[GitHub] [spark] AmplabJenkins commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #31830: URL: https://github.com/apache/spark/pull/31830#issuecomment-918751955 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47733/ --

[GitHub] [spark] LuciferYang edited a comment on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
LuciferYang edited a comment on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918751290 > IOUtils.closeQuietly can also receive consumer to define the behavior around IOException. (That said, JavaUtils.closeQuietly can be reimplemented as well

[GitHub] [spark] LuciferYang edited a comment on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
LuciferYang edited a comment on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918751290 > IOUtils.closeQuietly can also receive consumer to define the behavior around IOException. (That said, JavaUtils.closeQuietly can be reimplemented as well

[GitHub] [spark] LuciferYang commented on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
LuciferYang commented on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918751290 > IOUtils.closeQuietly can also receive consumer to define the behavior around IOException. (That said, JavaUtils.closeQuietly can be reimplemented as well leveraging

[GitHub] [spark] SparkQA commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
SparkQA commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918749147 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47732/ -- This is an automated message from the Apache

[GitHub] [spark] mridulm commented on a change in pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
mridulm commented on a change in pull request #33984: URL: https://github.com/apache/spark/pull/33984#discussion_r707865190 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2604,18 +2604,19 @@ private[spark] object Utils extends Logging { * -

[GitHub] [spark] mridulm commented on a change in pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
mridulm commented on a change in pull request #33984: URL: https://github.com/apache/spark/pull/33984#discussion_r707865190 ## File path: core/src/main/scala/org/apache/spark/util/Utils.scala ## @@ -2604,18 +2604,19 @@ private[spark] object Utils extends Logging { * -

[GitHub] [spark] SparkQA commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
SparkQA commented on pull request #31830: URL: https://github.com/apache/spark/pull/31830#issuecomment-918748709 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47733/ -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33930: URL: https://github.com/apache/spark/pull/33930#issuecomment-918747703 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143224/

[GitHub] [spark] AmplabJenkins commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33930: URL: https://github.com/apache/spark/pull/33930#issuecomment-918747703 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143224/ -- This

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33954: [SPARK-36709][PYTHON] Support new syntax for specifying index type and name in pandas API on Spark

2021-09-13 Thread GitBox
HyukjinKwon commented on a change in pull request #33954: URL: https://github.com/apache/spark/pull/33954#discussion_r707863591 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -84,11 +87,24 @@ def __repr__(self) -> str: class DataFrameType(object): def

[GitHub] [spark] HyukjinKwon closed pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
HyukjinKwon closed pull request #33982: URL: https://github.com/apache/spark/pull/33982 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
HyukjinKwon commented on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918745919 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] SparkQA removed a comment on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
SparkQA removed a comment on pull request #33930: URL: https://github.com/apache/spark/pull/33930#issuecomment-918602348 **[Test build #143224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143224/testReport)** for PR 33930 at commit

[GitHub] [spark] c21 commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-09-13 Thread GitBox
c21 commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-918745075 @cloud-fan - could you help take a look when you have time? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] ueshin commented on a change in pull request #33954: [SPARK-36709][PYTHON] Support new syntax for specifying index type and name in pandas API on Spark

2021-09-13 Thread GitBox
ueshin commented on a change in pull request #33954: URL: https://github.com/apache/spark/pull/33954#discussion_r707859153 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -538,6 +634,165 @@ def infer_return_type(f: Callable) -> Union[SeriesType, DataFrameType,

[GitHub] [spark] SparkQA commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
SparkQA commented on pull request #33930: URL: https://github.com/apache/spark/pull/33930#issuecomment-918742858 **[Test build #143224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143224/testReport)** for PR 33930 at commit

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33954: [SPARK-36709][PYTHON] Support new syntax for specifying index type and name in pandas API on Spark

2021-09-13 Thread GitBox
HyukjinKwon commented on a change in pull request #33954: URL: https://github.com/apache/spark/pull/33954#discussion_r707855604 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -538,6 +634,165 @@ def infer_return_type(f: Callable) -> Union[SeriesType,

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33954: [SPARK-36709][PYTHON] Support new syntax for specifying index type and name in pandas API on Spark

2021-09-13 Thread GitBox
HyukjinKwon commented on a change in pull request #33954: URL: https://github.com/apache/spark/pull/33954#discussion_r707855277 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -538,6 +634,165 @@ def infer_return_type(f: Callable) -> Union[SeriesType,

[GitHub] [spark] cfmcgrady commented on pull request #33956: [SPARK-36715][SQL] InferFiltersFromGenerate should not infer filter for udf

2021-09-13 Thread GitBox
cfmcgrady commented on pull request #33956: URL: https://github.com/apache/spark/pull/33956#issuecomment-918735834 Thanks for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] TongWeii commented on pull request #33986: Support sql overwrite a path that is also being read from when partit…

2021-09-13 Thread GitBox
TongWeii commented on pull request #33986: URL: https://github.com/apache/spark/pull/33986#issuecomment-918729534 cc @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #33986: Support sql overwrite a path that is also being read from when partit…

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33986: URL: https://github.com/apache/spark/pull/33986#issuecomment-918726537 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] TongWeii opened a new pull request #33986: Support sql overwrite a path that is also being read from when partit…

2021-09-13 Thread GitBox
TongWeii opened a new pull request #33986: URL: https://github.com/apache/spark/pull/33986 ### What changes were proposed in this pull request? ``` // non-partitioned table overwrite CREATE TABLE tbl (col1 INT, col2 STRING) USING PARQUET; INSERT OVERWRITE TABLE tbl SELECT 0,1;

[GitHub] [spark] SparkQA commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
SparkQA commented on pull request #31830: URL: https://github.com/apache/spark/pull/31830#issuecomment-918725293 **[Test build #143230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143230/testReport)** for PR 31830 at commit

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
kazuyukitanimura commented on a change in pull request #33930: URL: https://github.com/apache/spark/pull/33930#discussion_r707843248 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala ## @@ -330,4 +330,118 @@

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
kazuyukitanimura commented on a change in pull request #33930: URL: https://github.com/apache/spark/pull/33930#discussion_r707843248 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala ## @@ -330,4 +330,118 @@

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918710855 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] SparkQA commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
SparkQA commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918724273 **[Test build #143229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143229/testReport)** for PR 33984 at commit

[GitHub] [spark] YannisSismanis commented on pull request #33985: [WIP][SPARK-36745][SQL] Cleanup pattern ExtractEquiJoinKeys

2021-09-13 Thread GitBox
YannisSismanis commented on pull request #33985: URL: https://github.com/apache/spark/pull/33985#issuecomment-918724093 @cloud-fan @sigmod FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] AmplabJenkins commented on pull request #33985: [WIP][SPARK-36745][SQL] Cleanup pattern ExtractEquiJoinKeys

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33985: URL: https://github.com/apache/spark/pull/33985#issuecomment-918724059 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33969: URL: https://github.com/apache/spark/pull/33969#issuecomment-918723344 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47730/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-918723346 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47731/

[GitHub] [spark] AmplabJenkins commented on pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33969: URL: https://github.com/apache/spark/pull/33969#issuecomment-918723344 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47730/ --

[GitHub] [spark] AmplabJenkins commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-918723346 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47731/ --

[GitHub] [spark] YannisSismanis opened a new pull request #33985: [WIP][SPARK-36745][SQL] Cleanup pattern ExtractEquiJoinKeys

2021-09-13 Thread GitBox
YannisSismanis opened a new pull request #33985: URL: https://github.com/apache/spark/pull/33985 ### What changes were proposed in this pull request? The join condition returned from ExtractEquiJoinKeys does not correspond to the equi-join on the extracted left and right

[GitHub] [spark] SparkQA commented on pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
SparkQA commented on pull request #33969: URL: https://github.com/apache/spark/pull/33969#issuecomment-918721943 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47730/ -- This is an automated message from the

[GitHub] [spark] ulysses-you opened a new pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
ulysses-you opened a new pull request #31830: URL: https://github.com/apache/spark/pull/31830 ### What changes were proposed in this pull request? Add `SQL Properties` ins SQL tab that show the modified configs with sql execution. ### Why are the changes needed?

[GitHub] [spark] rmcyang edited a comment on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
rmcyang edited a comment on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918720727 > nit: Pull `conf.get(PUSH_BASED_SHUFFLE_ENABLED)` into a local variable. Fixed. Thanks @mridulm -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] rmcyang commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
rmcyang commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918720727 > nit: Pull `conf.get(PUSH_BASED_SHUFFLE_ENABLED)` into a local variable. Fixed @mridulm -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] SparkQA commented on pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
SparkQA commented on pull request #33969: URL: https://github.com/apache/spark/pull/33969#issuecomment-918718594 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47730/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-09-13 Thread GitBox
SparkQA commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-918717686 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47731/ --

[GitHub] [spark] ulysses-you commented on pull request #31830: [SPARK-34735][SQL][UI] Add modified configs for SQL execution in UI

2021-09-13 Thread GitBox
ulysses-you commented on pull request #31830: URL: https://github.com/apache/spark/pull/31830#issuecomment-918717110 aslo cc @HyukjinKwon @sarutak what do you think about this feature ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] mridulm commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
mridulm commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918715421 Ok to test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] rmcyang commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
rmcyang commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918712073 cc @mridulm @Ngone51 @gengliangwang @venkata91 @zhouyejoe Please take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] AmplabJenkins commented on pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33984: URL: https://github.com/apache/spark/pull/33984#issuecomment-918710855 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] rmcyang opened a new pull request #33984: [SPARK-36705][FOLLOW-UP] Fix unnecessary logWarning when PUSH_BASED_SHUFFLE_ENABLED is set to false

2021-09-13 Thread GitBox
rmcyang opened a new pull request #33984: URL: https://github.com/apache/spark/pull/33984 ### What changes were proposed in this pull request? Only throw logWarning when `PUSH_BASED_SHUFFLE_ENABLED` is set to true and `canDoPushBasedShuffle` is false ### Why are the

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33954: [SPARK-36709][PYTHON] Support new syntax for specifying index type and name in pandas API on Spark

2021-09-13 Thread GitBox
HyukjinKwon commented on a change in pull request #33954: URL: https://github.com/apache/spark/pull/33954#discussion_r707829443 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -538,6 +634,165 @@ def infer_return_type(f: Callable) -> Union[SeriesType,

[GitHub] [spark] ueshin commented on a change in pull request #33954: [SPARK-36709][PYTHON] Support new syntax for specifying index type and name in pandas API on Spark

2021-09-13 Thread GitBox
ueshin commented on a change in pull request #33954: URL: https://github.com/apache/spark/pull/33954#discussion_r707826582 ## File path: python/pyspark/pandas/typedef/typehints.py ## @@ -84,11 +87,24 @@ def __repr__(self) -> str: class DataFrameType(object): def

[GitHub] [spark] viirya commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
viirya commented on a change in pull request #33930: URL: https://github.com/apache/spark/pull/33930#discussion_r707829006 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala ## @@ -330,4 +330,118 @@ class

[GitHub] [spark] kazuyukitanimura commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
kazuyukitanimura commented on a change in pull request #33930: URL: https://github.com/apache/spark/pull/33930#discussion_r707826619 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -288,8 +288,24 @@ object OptimizeIn

[GitHub] [spark] viirya commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
viirya commented on a change in pull request #33930: URL: https://github.com/apache/spark/pull/33930#discussion_r707824426 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -288,8 +288,24 @@ object OptimizeIn extends

[GitHub] [spark] SparkQA commented on pull request #33432: [SPARK-32709][SQL] Support writing Hive bucketed table (Parquet/ORC format with Hive hash)

2021-09-13 Thread GitBox
SparkQA commented on pull request #33432: URL: https://github.com/apache/spark/pull/33432#issuecomment-918701216 **[Test build #143228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143228/testReport)** for PR 33432 at commit

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
xinrong-databricks commented on a change in pull request #33929: URL: https://github.com/apache/spark/pull/33929#discussion_r707775701 ## File path: python/pyspark/pandas/frame.py ## @@ -6695,53 +6706,94 @@ def drop( x y z w 0 1 3 5 7 1 2

[GitHub] [spark] SparkQA commented on pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
SparkQA commented on pull request #33969: URL: https://github.com/apache/spark/pull/33969#issuecomment-918700640 **[Test build #143227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143227/testReport)** for PR 33969 at commit

[GitHub] [spark] ahshahid commented on pull request #33983: [SPARK-33152] [SQL] New algorithm for ConstraintsPropagation rule to solve the problem of performance & OOM if the query plans have large ex

2021-09-13 Thread GitBox
ahshahid commented on pull request #33983: URL: https://github.com/apache/spark/pull/33983#issuecomment-918696627 @HyukjinKwon . I will be cleaning up for sure, by tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #33983: [SPARK-33152] [SQL] New algorithm for ConstraintsPropagation rule to solve the problem of performance & OOM if the query plans have large

2021-09-13 Thread GitBox
HyukjinKwon commented on pull request #33983: URL: https://github.com/apache/spark/pull/33983#issuecomment-918695372 @ahshahid, mind using markdown for codes for better readaibility in PR description, and following Github PR template as is

[GitHub] [spark] sarutak commented on pull request #33979: [SPARK-36739][DOCS][PYTHON] Add apache license headers to makefiles

2021-09-13 Thread GitBox
sarutak commented on pull request #33979: URL: https://github.com/apache/spark/pull/33979#issuecomment-918694161 @holdenk > Since this is a missing license issue is it a blocker? Yeah, I think so. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on pull request #33956: [SPARK-36715][SQL] InferFiltersFromGenerate should not infer filter for udf

2021-09-13 Thread GitBox
HyukjinKwon commented on pull request #33956: URL: https://github.com/apache/spark/pull/33956#issuecomment-918690839 Merged to master, branch-3.2 and branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon closed pull request #33956: [SPARK-36715][SQL] InferFiltersFromGenerate should not infer filter for udf

2021-09-13 Thread GitBox
HyukjinKwon closed pull request #33956: URL: https://github.com/apache/spark/pull/33956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #33980: [WIP][SPARK-32285]Add PySpark support for nested timestamps with arrow

2021-09-13 Thread GitBox
HyukjinKwon commented on pull request #33980: URL: https://github.com/apache/spark/pull/33980#issuecomment-918689130 @pralabhkumar mind keeping the GitHub PR template as is? (https://github.com/apache/spark/blob/master/.github/PULL_REQUEST_TEMPLATE) -- This is an automated message from

[GitHub] [spark] sunchao commented on a change in pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
sunchao commented on a change in pull request #33969: URL: https://github.com/apache/spark/pull/33969#discussion_r707814742 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala ## @@ -855,6 +855,12 @@ class

[GitHub] [spark] HyukjinKwon closed pull request #33979: [SPARK-36739][DOCS][PYTHON] Add apache license headers to makefiles

2021-09-13 Thread GitBox
HyukjinKwon closed pull request #33979: URL: https://github.com/apache/spark/pull/33979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] HyukjinKwon commented on pull request #33979: [SPARK-36739][DOCS][PYTHON] Add apache license headers to makefiles

2021-09-13 Thread GitBox
HyukjinKwon commented on pull request #33979: URL: https://github.com/apache/spark/pull/33979#issuecomment-918685023 Merged to master and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #33983: [SPARK-33152] [SQL] New algorithm for ConstraintsPropagation rule.

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33983: URL: https://github.com/apache/spark/pull/33983#issuecomment-918677850 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #32749: [SPARK-34943][BUILD] Upgrade flake8 to 3.8.0 or above in Jenkins

2021-09-13 Thread GitBox
HyukjinKwon commented on pull request #32749: URL: https://github.com/apache/spark/pull/32749#issuecomment-918677848 Thanks @shaneknapp. @pingsutw would you mind rebasing your PR? I think we're good to go. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] ahshahid opened a new pull request #33983: [SPARK-33152] [SQL] New algorithm for ConstraintsPropagation rule.

2021-09-13 Thread GitBox
ahshahid opened a new pull request #33983: URL: https://github.com/apache/spark/pull/33983 Re creating a new PR which was closed earlier This PR proposes new logic to store the constraint and track aliases in projection which eliminates the need of pessimistically generating all the

[GitHub] [spark] dbtsai commented on a change in pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
dbtsai commented on a change in pull request #33969: URL: https://github.com/apache/spark/pull/33969#discussion_r707803510 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala ## @@ -855,6 +855,12 @@ class

[GitHub] [spark] dbtsai commented on pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
dbtsai commented on pull request #33930: URL: https://github.com/apache/spark/pull/33930#issuecomment-918669704 Ping @viirya and @sunchao for another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dbtsai commented on a change in pull request #33930: [SPARK-36665][SQL] Add more Not operator simplifications

2021-09-13 Thread GitBox
dbtsai commented on a change in pull request #33930: URL: https://github.com/apache/spark/pull/33930#discussion_r707796357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala ## @@ -441,6 +456,25 @@ object BooleanSimplification

[GitHub] [spark] HeartSaVioR edited a comment on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
HeartSaVioR edited a comment on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918668142 IOUtils.closeQuietly can also receive consumer to define the behavior around IOException. (That said, JavaUtils.closeQuietly can be reimplemented as well leveraging

[GitHub] [spark] HeartSaVioR commented on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
HeartSaVioR commented on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918668142 IOUtils.closeQuietly can also receive consumer to define the behavior around IOException. The default behavior is no-op (so no log message). -- This is an automated

[GitHub] [spark] HeartSaVioR commented on pull request #33977: [SPARK-36737][BUILD][CORE][SQL][SS] Upgrade Apache commons-io to 2.11.0 and revert change of SPARK-36456

2021-09-13 Thread GitBox
HeartSaVioR commented on pull request #33977: URL: https://github.com/apache/spark/pull/33977#issuecomment-918666240 I can't say for every places we use, but if I understand correctly, in HDFSBackedStateStoreProvider, we simplified the logic via allowing to close stream twice (with

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918665514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918665515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32749: [SPARK-34943][BUILD] Upgrade flake8 to 3.8.0 or above in Jenkins

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #32749: URL: https://github.com/apache/spark/pull/32749#issuecomment-918665513 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143223/

[GitHub] [spark] SparkQA commented on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
SparkQA commented on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918665610 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47728/ -- This is an automated message from the

[GitHub] [spark] AmplabJenkins commented on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918665627 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/47728/ --

[GitHub] [spark] AmplabJenkins commented on pull request #32749: [SPARK-34943][BUILD] Upgrade flake8 to 3.8.0 or above in Jenkins

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #32749: URL: https://github.com/apache/spark/pull/32749#issuecomment-918665513 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143223/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918665514 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143225/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918665516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins commented on pull request #33979: [SPARK-36739][DOCS][PYTHON] Add apache license headers to makefiles

2021-09-13 Thread GitBox
AmplabJenkins commented on pull request #33979: URL: https://github.com/apache/spark/pull/33979#issuecomment-918665547 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] SparkQA removed a comment on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
SparkQA removed a comment on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918637378 **[Test build #143225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143225/testReport)** for PR 33929 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
SparkQA removed a comment on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918644562 **[Test build #143226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143226/testReport)** for PR 33982 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #32749: [SPARK-34943][BUILD] Upgrade flake8 to 3.8.0 or above in Jenkins

2021-09-13 Thread GitBox
SparkQA removed a comment on pull request #32749: URL: https://github.com/apache/spark/pull/32749#issuecomment-918577735 **[Test build #143223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143223/testReport)** for PR 32749 at commit

[GitHub] [spark] SparkQA commented on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
SparkQA commented on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918662688 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47729/ --

[GitHub] [spark] SparkQA commented on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
SparkQA commented on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918661894 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47728/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
SparkQA commented on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918659275 **[Test build #143226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143226/testReport)** for PR 33982 at commit

[GitHub] [spark] SparkQA commented on pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
SparkQA commented on pull request #33929: URL: https://github.com/apache/spark/pull/33929#issuecomment-918652763 **[Test build #143225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143225/testReport)** for PR 33929 at commit

[GitHub] [spark] SparkQA commented on pull request #32749: [SPARK-34943][BUILD] Upgrade flake8 to 3.8.0 or above in Jenkins

2021-09-13 Thread GitBox
SparkQA commented on pull request #32749: URL: https://github.com/apache/spark/pull/32749#issuecomment-918652119 **[Test build #143223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143223/testReport)** for PR 32749 at commit

[GitHub] [spark] xinrong-databricks commented on pull request #33964: [WIP][SPARK-36746][PYTHON] Refactor `_select_rows_by_iterable` in `iLocIndexer` to use `Column.isin`

2021-09-13 Thread GitBox
xinrong-databricks commented on pull request #33964: URL: https://github.com/apache/spark/pull/33964#issuecomment-918645242 To adjust according to https://github.com/apache/spark/pull/33982. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ueshin commented on a change in pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
ueshin commented on a change in pull request #33929: URL: https://github.com/apache/spark/pull/33929#discussion_r707775397 ## File path: python/pyspark/pandas/frame.py ## @@ -6695,53 +6706,94 @@ def drop( x y z w 0 1 3 5 7 1 2 4 6 8 -

[GitHub] [spark] SparkQA commented on pull request #33982: [SPARK-36748][PYTHON] Introduce the 'compute.isin_limit' option

2021-09-13 Thread GitBox
SparkQA commented on pull request #33982: URL: https://github.com/apache/spark/pull/33982#issuecomment-918644562 **[Test build #143226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143226/testReport)** for PR 33982 at commit

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
xinrong-databricks commented on a change in pull request #33929: URL: https://github.com/apache/spark/pull/33929#discussion_r707776125 ## File path: python/pyspark/pandas/frame.py ## @@ -6695,53 +6706,94 @@ def drop( x y z w 0 1 3 5 7 1 2

[GitHub] [spark] xinrong-databricks commented on a change in pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

2021-09-13 Thread GitBox
xinrong-databricks commented on a change in pull request #33929: URL: https://github.com/apache/spark/pull/33929#discussion_r707775701 ## File path: python/pyspark/pandas/frame.py ## @@ -6695,53 +6706,94 @@ def drop( x y z w 0 1 3 5 7 1 2

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33969: [SPARK-36726] Upgrade Parquet to 1.12.1

2021-09-13 Thread GitBox
AmplabJenkins removed a comment on pull request #33969: URL: https://github.com/apache/spark/pull/33969#issuecomment-918643187 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143222/

<    1   2   3   4   5   6   7   >