[GitHub] [spark] jerrypeng opened a new pull request, #38517: [WIP][SPARK-39591][SS] Async Progress Tracking

2022-11-04 Thread GitBox
jerrypeng opened a new pull request, #38517: URL: https://github.com/apache/spark/pull/38517 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-04 Thread GitBox
wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1304400795 @cloud-fan @AngersZh Could you help to review this PR ? Another PR https://github.com/apache/spark/pull/38496 depends on this. -- This is an automated message from the Apache Git

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-04 Thread GitBox
wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1304399983 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014560227 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014559977 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014559938 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] ljfgem commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
ljfgem commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014550696 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014549689 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] attilapiros commented on pull request #38516: [SPARK-32380][SQL] Fixing access of HBase table via Hive

2022-11-04 Thread GitBox
attilapiros commented on PR #38516: URL: https://github.com/apache/spark/pull/38516#issuecomment-1304373701 cc @dongjoon-hyun, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] attilapiros opened a new pull request, #38516: Initial version

2022-11-04 Thread GitBox
attilapiros opened a new pull request, #38516: URL: https://github.com/apache/spark/pull/38516 ### What changes were proposed in this pull request? This is an update of https://github.com/apache/spark/pull/29178 which was closed because the root cause of the error was just vaguely

[GitHub] [spark] ljfgem commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
ljfgem commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014536719 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] github-actions[bot] commented on pull request #34637: [SPARK-37349][SQL] add SQL Rest API parsing logic

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #34637: URL: https://github.com/apache/spark/pull/34637#issuecomment-1304354583 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-11-04 Thread GitBox
github-actions[bot] closed pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables URL: https://github.com/apache/spark/pull/37083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] closed pull request #37226: [MINOR][SQL] Simplify the description of built-in function.

2022-11-04 Thread GitBox
github-actions[bot] closed pull request #37226: [MINOR][SQL] Simplify the description of built-in function. URL: https://github.com/apache/spark/pull/37226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] commented on pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37009: URL: https://github.com/apache/spark/pull/37009#issuecomment-1304354576 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project

2022-11-04 Thread GitBox
github-actions[bot] closed pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project URL: https://github.com/apache/spark/pull/37239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the `spark.sql.execution.topKSortMaxRowsThreshold`

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37104: URL: https://github.com/apache/spark/pull/37104#issuecomment-1304354561 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37309: [SPARK-39871][CORE] Jmx http interface supported for SparkHistoryServer

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37309: URL: https://github.com/apache/spark/pull/37309#issuecomment-1304354542 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37315: [SPARK-39892][SQL] Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)

2022-11-04 Thread GitBox
github-actions[bot] commented on PR #37315: URL: https://github.com/apache/spark/pull/37315#issuecomment-1304354536 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014532124 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014532124 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] SandishKumarHN commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-04 Thread GitBox
SandishKumarHN commented on PR #38515: URL: https://github.com/apache/spark/pull/38515#issuecomment-1304351416 @rangadi Because some random numbers do not convert to catalyst type, a null check for the data generator is required. -- This is an automated message from the Apache Git

[GitHub] [spark] SandishKumarHN opened a new pull request, #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-04 Thread GitBox
SandishKumarHN opened a new pull request, #38515: URL: https://github.com/apache/spark/pull/38515 ### What changes were proposed in this pull request? null check for data generator after type conversion NA ### Why are the changes needed? NA ### Does this PR

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014530432 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014526120 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014525886 ## docs/sql-performance-tuning.md: ## @@ -77,8 +77,8 @@ that these options will be deprecated in future release as more optimizations ar

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014524565 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014524565 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] swamirishi commented on a diff in pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox
swamirishi commented on code in PR #38377: URL: https://github.com/apache/spark/pull/38377#discussion_r1014524002 ## core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala: ## @@ -142,7 +142,7 @@ private[spark] class DriverLogger(conf: SparkConf) extends Logging

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014523963 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522309 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014515719 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014507922 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014506227 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox
xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014502881 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] liuzqt commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-04 Thread GitBox
liuzqt commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1304306588 @mridulm I got a error when running that command in my local ``` [error] /Users/ziqi.liu/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:51:

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-04 Thread GitBox
mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1304281819 Looks like doc build is failing and so failing build ... Can you run `build/sbt -Phadoop-3 -Pyarn -Pdocker-integration-tests -Pspark-ganglia-lgpl -Phive -Pmesos -Phive-thriftserver

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1014476080 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1014475066 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] AmplabJenkins commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1304270266 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38505: [SPARK-40622][WIP]do not merge(try to fix build error)

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38505: URL: https://github.com/apache/spark/pull/38505#issuecomment-1304270304 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on a diff in pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox
mridulm commented on code in PR #38377: URL: https://github.com/apache/spark/pull/38377#discussion_r1014469439 ## core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala: ## @@ -142,7 +142,7 @@ private[spark] class DriverLogger(conf: SparkConf) extends Logging {

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014428293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -157,10 +172,11 @@ object

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014425174 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -157,10 +193,11 @@ object

[GitHub] [spark] dwsmith1983 commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014424662 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014421466 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1304103037 > OK, any other related files you want to check while your'e here? I am doing some studying so not sure what other docs I will read and when. -- This is an automated message

[GitHub] [spark] dwsmith1983 commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014419552 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014417907 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics

[GitHub] [spark] mridulm commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-04 Thread GitBox
mridulm commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1304076366 Merged to master. Thanks for working on this @eejbyfeldt ! Thanks for the reviews @srowen, @dongjoon-hyun, @LuciferYang :-) -- This is an automated message from the Apache Git

[GitHub] [spark] asfgit closed pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-04 Thread GitBox
asfgit closed pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13 URL: https://github.com/apache/spark/pull/38427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] AmplabJenkins commented on pull request #38509: [SPARK-41014][PySpark][DOC] Improve documentation and typing of groupby and cogroup applyInPandas

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38509: URL: https://github.com/apache/spark/pull/38509#issuecomment-1304060587 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
AmplabJenkins commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1304060535 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-04 Thread GitBox
mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1014392620 ## core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala: ## @@ -1780,7 +1802,19 @@ private[spark] object JsonProtocolSuite extends Assertions { |

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014391666 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala: ## @@ -507,15 +507,13 @@ class UnsupportedOperationsSuite extends

[GitHub] [spark] aokolnychyi commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-04 Thread GitBox
aokolnychyi commented on PR #36304: URL: https://github.com/apache/spark/pull/36304#issuecomment-1304020160 Still remember about following up on this and another PR. Slowly getting there. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] MaxGekk opened a new pull request, #38514: [WIP][SQL] Provide a query context to `failAnalysis()`

2022-11-04 Thread GitBox
MaxGekk opened a new pull request, #38514: URL: https://github.com/apache/spark/pull/38514 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] gengliangwang commented on pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-04 Thread GitBox
gengliangwang commented on PR #38513: URL: https://github.com/apache/spark/pull/38513#issuecomment-1304002389 cc @cloud-fan @srielau @ulysses-you @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang opened a new pull request, #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-04 Thread GitBox
gengliangwang opened a new pull request, #38513: URL: https://github.com/apache/spark/pull/38513 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/38379. On second thought, if the canonicalized `Add` has a

[GitHub] [spark] amaliujia commented on pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-04 Thread GitBox
amaliujia commented on PR #38488: URL: https://github.com/apache/spark/pull/38488#issuecomment-1303988901 Ok added short description for the new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] amaliujia commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-04 Thread GitBox
amaliujia commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1303988661 Ok added short description for the new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang closed pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary

2022-11-04 Thread GitBox
gengliangwang closed pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary URL: https://github.com/apache/spark/pull/38479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] gengliangwang commented on pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary

2022-11-04 Thread GitBox
gengliangwang commented on PR #38479: URL: https://github.com/apache/spark/pull/38479#issuecomment-1303969098 Thanks for fixing it. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk closed pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox
MaxGekk closed pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes URL: https://github.com/apache/spark/pull/38498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox
MaxGekk commented on PR #38498: URL: https://github.com/apache/spark/pull/38498#issuecomment-1303953617 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014302073 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,23 +42,49 @@ object UnsupportedOperationChecker

[GitHub] [spark] ueshin commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-04 Thread GitBox
ueshin commented on code in PR #38223: URL: https://github.com/apache/spark/pull/38223#discussion_r1014300546 ## python/pyspark/worker.py: ## @@ -159,27 +226,13 @@ def wrapped(left_key_series, left_value_series, right_key_series, right_value_se key_series =

[GitHub] [spark] MaxGekk commented on a diff in pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-11-04 Thread GitBox
MaxGekk commented on code in PR #37887: URL: https://github.com/apache/spark/pull/37887#discussion_r1014297862 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala: ## @@ -20,66 +20,112 @@ package

[GitHub] [spark] jerrypeng commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-04 Thread GitBox
jerrypeng commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1014297681 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -277,10 +295,34 @@ class HDFSMetadataLog[T <: AnyRef :

[GitHub] [spark] anchovYu commented on a diff in pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-11-04 Thread GitBox
anchovYu commented on code in PR #37887: URL: https://github.com/apache/spark/pull/37887#discussion_r1014283621 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala: ## @@ -20,66 +20,112 @@ package

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox
WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014281945 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,23 +42,49 @@ object UnsupportedOperationChecker

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014280302 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging {

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014279677 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014279677 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014270879 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014270879 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox
hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014269647 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #37972: [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf

2022-11-04 Thread GitBox
SandishKumarHN commented on code in PR #37972: URL: https://github.com/apache/spark/pull/37972#discussion_r1014260137 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the

[GitHub] [spark] swamirishi commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox
swamirishi commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1303865771 > Two points: > > * spark.driver.log.dfsDir is typically expected to be a path to hdfs - so resolving it relative to current working directory does not make sense > * If

[GitHub] [spark] FouadApp closed pull request #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source

2022-11-04 Thread GitBox
FouadApp closed pull request #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source URL: https://github.com/apache/spark/pull/38512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] FouadApp opened a new pull request, #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source

2022-11-04 Thread GitBox
FouadApp opened a new pull request, #38512: URL: https://github.com/apache/spark/pull/38512 ### What changes were proposed in this pull request? This support could read source files of partitioned hive table with subdirectories. ### Why are the changes needed? While use

[GitHub] [spark] LuciferYang commented on pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox
LuciferYang commented on PR #38498: URL: https://github.com/apache/spark/pull/38498#issuecomment-1303797442 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] FouadApp commented on pull request #32679: [SPARK-28098][SQL]Support read hive table while LeafDir had multi-level paths

2022-11-04 Thread GitBox
FouadApp commented on PR #32679: URL: https://github.com/apache/spark/pull/32679#issuecomment-1303780636 I have the same problem: With the TEZ engine writing data in the presence of union all: part_date=/HIVE_UNION_SUBDIR_1/part_000 (parquet)

[GitHub] [spark] FouadApp commented on pull request #32679: [SPARK-28098][SQL]Support read hive table while LeafDir had multi-level paths

2022-11-04 Thread GitBox
FouadApp commented on PR #32679: URL: https://github.com/apache/spark/pull/32679#issuecomment-1303772792 > Any chance of this getting picked up again? I saw it was merged in a fork: [lyft#40](https://github.com/lyft/spark/pull/40) but it would be great to have it upstream but, it's

[GitHub] [spark] cloud-fan opened a new pull request, #38511: [WIP][SPARK-41017][SQL] Do not push Filter through reference-only Project

2022-11-04 Thread GitBox
cloud-fan opened a new pull request, #38511: URL: https://github.com/apache/spark/pull/38511 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] HyukjinKwon closed pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox
HyukjinKwon closed pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect URL: https://github.com/apache/spark/pull/38462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox
HyukjinKwon commented on PR #38462: URL: https://github.com/apache/spark/pull/38462#issuecomment-1303686102 Merged to master. Let's address complete types in a followup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon closed pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-04 Thread GitBox
HyukjinKwon closed pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client URL: https://github.com/apache/spark/pull/38485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1303640935 @itholic I was going over another topic and made some updates on sql performance tuning as well. I added a screenshot of the markdown. This how you want it correct? -- This is an

[GitHub] [spark] dwsmith1983 opened a new pull request, #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox
dwsmith1983 opened a new pull request, #38510: URL: https://github.com/apache/spark/pull/38510 ### What changes were proposed in this pull request? I made some small grammar fixes related to dependent clause followed but independent clauses, starting a sentence with an

[GitHub] [spark] srielau commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-04 Thread GitBox
srielau commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r1014078710 ## core/src/main/resources/error/error-classes.json: ## @@ -668,6 +668,24 @@ } } }, + "LOCATION_ALREADY_EXISTS" : { +"message" : [ + "Cannot

[GitHub] [spark] srielau commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-04 Thread GitBox
srielau commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r1014078710 ## core/src/main/resources/error/error-classes.json: ## @@ -668,6 +668,24 @@ } } }, + "LOCATION_ALREADY_EXISTS" : { +"message" : [ + "Cannot

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013973661 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013972337 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013971238 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-04 Thread GitBox
gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1013969510 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( //

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox
WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013960675 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return

  1   2   >