date:20221104

[GitHub] [spark] jerrypeng opened a new pull request, #38517: [WIP][SPARK-39591][SS] Async Progress Tracking

2022-11-04 Thread GitBox

jerrypeng opened a new pull request, #38517: URL: https://github.com/apache/spark/pull/38517 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-04 Thread GitBox

wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1304400795 @cloud-fan @AngersZh Could you help to review this PR ? Another PR https://github.com/apache/spark/pull/38496 depends on this. -- This is an automated message from the Apache Git S

[GitHub] [spark] wankunde commented on pull request #38495: [SPARK-35531][SQL] Update hive table stats without unnecessary convert

2022-11-04 Thread GitBox

wankunde commented on PR #38495: URL: https://github.com/apache/spark/pull/38495#issuecomment-1304399983 Retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014560227 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(r

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014559977 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(r

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

zhengruifeng commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014559938 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(r

[GitHub] [spark] ljfgem commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

ljfgem commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014550696 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014549689 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] attilapiros commented on pull request #38516: [SPARK-32380][SQL] Fixing access of HBase table via Hive

2022-11-04 Thread GitBox

attilapiros commented on PR #38516: URL: https://github.com/apache/spark/pull/38516#issuecomment-1304373701 cc @dongjoon-hyun, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] attilapiros opened a new pull request, #38516: Initial version

2022-11-04 Thread GitBox

attilapiros opened a new pull request, #38516: URL: https://github.com/apache/spark/pull/38516 ### What changes were proposed in this pull request? This is an update of https://github.com/apache/spark/pull/29178 which was closed because the root cause of the error was just vaguely def

[GitHub] [spark] ljfgem commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

ljfgem commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014536719 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] github-actions[bot] commented on pull request #34637: [SPARK-37349][SQL] add SQL Rest API parsing logic

2022-11-04 Thread GitBox

github-actions[bot] commented on PR #34637: URL: https://github.com/apache/spark/pull/34637#issuecomment-1304354583 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-11-04 Thread GitBox

github-actions[bot] closed pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables URL: https://github.com/apache/spark/pull/37083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] github-actions[bot] closed pull request #37226: [MINOR][SQL] Simplify the description of built-in function.

2022-11-04 Thread GitBox

github-actions[bot] closed pull request #37226: [MINOR][SQL] Simplify the description of built-in function. URL: https://github.com/apache/spark/pull/37226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] commented on pull request #37009: [SPARK-38292][PYTHON]Support na_filter for pyspark.pandas.read_csv

2022-11-04 Thread GitBox

github-actions[bot] commented on PR #37009: URL: https://github.com/apache/spark/pull/37009#issuecomment-1304354576 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project

2022-11-04 Thread GitBox

github-actions[bot] closed pull request #37239: [SPARK-39825][SQL] Fix PushDownLeftSemiAntiJoin push through project URL: https://github.com/apache/spark/pull/37239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #37104: [SPARK-39698][SQL] Use `TakeOrderedAndProject` if maxRows below the `spark.sql.execution.topKSortMaxRowsThreshold`

2022-11-04 Thread GitBox

github-actions[bot] commented on PR #37104: URL: https://github.com/apache/spark/pull/37104#issuecomment-1304354561 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37309: [SPARK-39871][CORE] Jmx http interface supported for SparkHistoryServer

2022-11-04 Thread GitBox

github-actions[bot] commented on PR #37309: URL: https://github.com/apache/spark/pull/37309#issuecomment-1304354542 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37315: [SPARK-39892][SQL] Use ArrowType.Decimal(precision, scale, bitWidth) instead of ArrowType.Decimal(precision, scale)

2022-11-04 Thread GitBox

github-actions[bot] commented on PR #37315: URL: https://github.com/apache/spark/pull/37315#issuecomment-1304354536 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014532124 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014532124 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] SandishKumarHN commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-04 Thread GitBox

SandishKumarHN commented on PR #38515: URL: https://github.com/apache/spark/pull/38515#issuecomment-1304351416 @rangadi Because some random numbers do not convert to catalyst type, a null check for the data generator is required. -- This is an automated message from the Apache Git Service

[GitHub] [spark] SandishKumarHN opened a new pull request, #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

2022-11-04 Thread GitBox

SandishKumarHN opened a new pull request, #38515: URL: https://github.com/apache/spark/pull/38515 ### What changes were proposed in this pull request? null check for data generator after type conversion NA ### Why are the changes needed? NA ### Does this PR intr

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014530432 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014526120 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014525886 ## docs/sql-performance-tuning.md: ## @@ -77,8 +77,8 @@ that these options will be deprecated in future release as more optimizations ar spark.sql.files.openCostIn

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014524565 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014524565 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [spark] swamirishi commented on a diff in pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox

swamirishi commented on code in PR #38377: URL: https://github.com/apache/spark/pull/38377#discussion_r1014524002 ## core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala: ## @@ -142,7 +142,7 @@ private[spark] class DriverLogger(conf: SparkConf) extends Logging {

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014523963 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522636 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014522309 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014515719 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014507922 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [spark] jzhuge commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

jzhuge commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014506227 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] [spark] xkrogen commented on a diff in pull request #37556: [SPARK-39799][SQL] DataSourceV2: View catalog interface

2022-11-04 Thread GitBox

xkrogen commented on code in PR #37556: URL: https://github.com/apache/spark/pull/37556#discussion_r1014502881 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/View.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[GitHub] [spark] liuzqt commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-04 Thread GitBox

liuzqt commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1304306588 @mridulm I got a error when running that command in my local ``` [error] /Users/ziqi.liu/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:51: F

[GitHub] [spark] mridulm commented on pull request #38064: [SPARK-40622][SQL][CORE]Result of a single task in collect() must fit in 2GB

2022-11-04 Thread GitBox

mridulm commented on PR #38064: URL: https://github.com/apache/spark/pull/38064#issuecomment-1304281819 Looks like doc build is failing and so failing build ... Can you run `build/sbt -Phadoop-3 -Pyarn -Pdocker-integration-tests -Pspark-ganglia-lgpl -Phive -Pmesos -Phive-thriftserver -Pk

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox

leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1014476080 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return Column(sc._jvm.org.apache.spark.ml.functions.array

[GitHub] [spark] leewyang commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox

leewyang commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1014475066 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return Column(sc._jvm.org.apache.spark.ml.functions.array

[GitHub] [spark] AmplabJenkins commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-04 Thread GitBox

AmplabJenkins commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1304270266 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38505: [SPARK-40622][WIP]do not merge(try to fix build error)

2022-11-04 Thread GitBox

AmplabJenkins commented on PR #38505: URL: https://github.com/apache/spark/pull/38505#issuecomment-1304270304 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] mridulm commented on a diff in pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox

mridulm commented on code in PR #38377: URL: https://github.com/apache/spark/pull/38377#discussion_r1014469439 ## core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala: ## @@ -142,7 +142,7 @@ private[spark] class DriverLogger(conf: SparkConf) extends Logging {

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox

alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014428293 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -157,10 +172,11 @@ object UnsupportedOperationChecke

[GitHub] [spark] alex-balikov commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox

alex-balikov commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014425174 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -157,10 +193,11 @@ object UnsupportedOperationChecke

[GitHub] [spark] dwsmith1983 commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

dwsmith1983 commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014424662 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics spark.sql.adaptiv

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014421466 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics spark.sql.adaptive.aut

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

dwsmith1983 commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1304103037 > OK, any other related files you want to check while your'e here? I am doing some studying so not sure what other docs I will read and when. -- This is an automated message

[GitHub] [spark] dwsmith1983 commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

dwsmith1983 commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014419552 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics spark.sql.adaptiv

[GitHub] [spark] srowen commented on a diff in pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

srowen commented on code in PR #38510: URL: https://github.com/apache/spark/pull/38510#discussion_r1014417907 ## docs/sql-performance-tuning.md: ## @@ -295,7 +294,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics spark.sql.adaptive.aut

[GitHub] [spark] mridulm commented on pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-04 Thread GitBox

mridulm commented on PR #38427: URL: https://github.com/apache/spark/pull/38427#issuecomment-1304076366 Merged to master. Thanks for working on this @eejbyfeldt ! Thanks for the reviews @srowen, @dongjoon-hyun, @LuciferYang :-) -- This is an automated message from the Apache Git Serv

[GitHub] [spark] asfgit closed pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13

2022-11-04 Thread GitBox

asfgit closed pull request #38427: [SPARK-40950][CORE] Fix isRemoteAddressMaxedOut performance overhead on scala 2.13 URL: https://github.com/apache/spark/pull/38427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins commented on pull request #38509: [SPARK-41014][PySpark][DOC] Improve documentation and typing of groupby and cogroup applyInPandas

2022-11-04 Thread GitBox

AmplabJenkins commented on PR #38509: URL: https://github.com/apache/spark/pull/38509#issuecomment-1304060587 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

AmplabJenkins commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1304060535 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] mridulm commented on a diff in pull request #36165: [SPARK-36620][SHUFFLE] Add Push Based Shuffle client side metrics

2022-11-04 Thread GitBox

mridulm commented on code in PR #36165: URL: https://github.com/apache/spark/pull/36165#discussion_r1014392620 ## core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala: ## @@ -1780,7 +1802,19 @@ private[spark] object JsonProtocolSuite extends Assertions { |

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox

WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014391666 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala: ## @@ -507,15 +507,13 @@ class UnsupportedOperationsSuite extends

[GitHub] [spark] aokolnychyi commented on pull request #36304: [SPARK-38959][SQL] DS V2: Support runtime group filtering in row-level commands

2022-11-04 Thread GitBox

aokolnychyi commented on PR #36304: URL: https://github.com/apache/spark/pull/36304#issuecomment-1304020160 Still remember about following up on this and another PR. Slowly getting there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] MaxGekk opened a new pull request, #38514: [WIP][SQL] Provide a query context to `failAnalysis()`

2022-11-04 Thread GitBox

MaxGekk opened a new pull request, #38514: URL: https://github.com/apache/spark/pull/38514 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] gengliangwang commented on pull request #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-04 Thread GitBox

gengliangwang commented on PR #38513: URL: https://github.com/apache/spark/pull/38513#issuecomment-1304002389 cc @cloud-fan @srielau @ulysses-you @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] gengliangwang opened a new pull request, #38513: [SPARK-40903][SQL][FOLLOWUP] Cast canonicalized Add as its original data type if necessary

2022-11-04 Thread GitBox

gengliangwang opened a new pull request, #38513: URL: https://github.com/apache/spark/pull/38513 ### What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/38379. On second thought, if the canonicalized `Add` has a differen

[GitHub] [spark] amaliujia commented on pull request #38488: [SPARK-41002][CONNECT][PYTHON] Compatible `take`, `head` and `first` API in Python client

2022-11-04 Thread GitBox

amaliujia commented on PR #38488: URL: https://github.com/apache/spark/pull/38488#issuecomment-1303988901 Ok added short description for the new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] amaliujia commented on pull request #38506: [SPARK-41010][CONNECT][PYTHON] Complete Support for Except and Intersect in Python client

2022-11-04 Thread GitBox

amaliujia commented on PR #38506: URL: https://github.com/apache/spark/pull/38506#issuecomment-1303988661 Ok added short description for the new test cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] gengliangwang closed pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary

2022-11-04 Thread GitBox

gengliangwang closed pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary URL: https://github.com/apache/spark/pull/38479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] gengliangwang commented on pull request #38479: [SPARK-40697][SQL][FOLLOWUP] Read-side char padding should only be applied if necessary

2022-11-04 Thread GitBox

gengliangwang commented on PR #38479: URL: https://github.com/apache/spark/pull/38479#issuecomment-1303969098 Thanks for fixing it. Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk closed pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox

MaxGekk closed pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes URL: https://github.com/apache/spark/pull/38498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox

MaxGekk commented on PR #38498: URL: https://github.com/apache/spark/pull/38498#issuecomment-1303953617 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox

WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014302073 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,23 +42,49 @@ object UnsupportedOperationChecker extends

[GitHub] [spark] ueshin commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-04 Thread GitBox

ueshin commented on code in PR #38223: URL: https://github.com/apache/spark/pull/38223#discussion_r1014300546 ## python/pyspark/worker.py: ## @@ -159,27 +226,13 @@ def wrapped(left_key_series, left_value_series, right_key_series, right_value_se key_series = left_ke

[GitHub] [spark] MaxGekk commented on a diff in pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-11-04 Thread GitBox

MaxGekk commented on code in PR #37887: URL: https://github.com/apache/spark/pull/37887#discussion_r1014297862 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala: ## @@ -20,66 +20,112 @@ package org.apache.spark.sql.catalyst.analysi

[GitHub] [spark] jerrypeng commented on a diff in pull request #38430: [SPARK-40957] Add in memory cache in HDFSMetadataLog

2022-11-04 Thread GitBox

jerrypeng commented on code in PR #38430: URL: https://github.com/apache/spark/pull/38430#discussion_r1014297681 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala: ## @@ -277,10 +295,34 @@ class HDFSMetadataLog[T <: AnyRef : ClassTag](spa

[GitHub] [spark] anchovYu commented on a diff in pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-11-04 Thread GitBox

anchovYu commented on code in PR #37887: URL: https://github.com/apache/spark/pull/37887#discussion_r1014283621 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala: ## @@ -20,66 +20,112 @@ package org.apache.spark.sql.catalyst.analys

[GitHub] [spark] WweiL commented on a diff in pull request #38503: [SPARK-40940] Remove Multi-stateful operator checkers for streaming queries.

2022-11-04 Thread GitBox

WweiL commented on code in PR #38503: URL: https://github.com/apache/spark/pull/38503#discussion_r1014281945 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala: ## @@ -41,23 +42,49 @@ object UnsupportedOperationChecker extends

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014280302 ## sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala: ## @@ -128,6 +128,65 @@ private[sql] object ArrowConverters extends Logging {

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014279677 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(res

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014279677 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(res

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014270879 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(res

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014270879 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(res

[GitHub] [spark] hvanhovell commented on a diff in pull request #38468: [SPARK-41005][CONNECT][PYTHON] Arrow-based collect

2022-11-04 Thread GitBox

hvanhovell commented on code in PR #38468: URL: https://github.com/apache/spark/pull/38468#discussion_r1014269647 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala: ## @@ -117,7 +126,70 @@ class SparkConnectStreamHandler(res

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #37972: [SPARK-40654][SQL] Protobuf support for Spark - from_protobuf AND to_protobuf

2022-11-04 Thread GitBox

SandishKumarHN commented on code in PR #37972: URL: https://github.com/apache/spark/pull/37972#discussion_r1014260137 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache

[GitHub] [spark] swamirishi commented on pull request #38377: [SPARK-40901][CORE] Unable to store Spark Driver logs with Absolute Hadoop based URI FS Path

2022-11-04 Thread GitBox

swamirishi commented on PR #38377: URL: https://github.com/apache/spark/pull/38377#issuecomment-1303865771 > Two points: > > * spark.driver.log.dfsDir is typically expected to be a path to hdfs - so resolving it relative to current working directory does not make sense > * If `root

[GitHub] [spark] FouadApp closed pull request #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source

2022-11-04 Thread GitBox

FouadApp closed pull request #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source URL: https://github.com/apache/spark/pull/38512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] FouadApp opened a new pull request, #38512: WIP: [SPARK-38564] Support read hive table from subdirectory source

2022-11-04 Thread GitBox

FouadApp opened a new pull request, #38512: URL: https://github.com/apache/spark/pull/38512 ### What changes were proposed in this pull request? This support could read source files of partitioned hive table with subdirectories. ### Why are the changes needed? While use spar

[GitHub] [spark] LuciferYang commented on pull request #38498: [SPARK-40769][CORE][SQL] Migrate type check failures of aggregate expressions onto error classes

2022-11-04 Thread GitBox

LuciferYang commented on PR #38498: URL: https://github.com/apache/spark/pull/38498#issuecomment-1303797442 GA passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [spark] FouadApp commented on pull request #32679: [SPARK-28098][SQL]Support read hive table while LeafDir had multi-level paths

2022-11-04 Thread GitBox

FouadApp commented on PR #32679: URL: https://github.com/apache/spark/pull/32679#issuecomment-1303780636 I have the same problem: With the TEZ engine writing data in the presence of union all: part_date=/HIVE_UNION_SUBDIR_1/part_000 (parquet) part_date=/HIVE_UNION_SUB

[GitHub] [spark] FouadApp commented on pull request #32679: [SPARK-28098][SQL]Support read hive table while LeafDir had multi-level paths

2022-11-04 Thread GitBox

FouadApp commented on PR #32679: URL: https://github.com/apache/spark/pull/32679#issuecomment-1303772792 > Any chance of this getting picked up again? I saw it was merged in a fork: [lyft#40](https://github.com/lyft/spark/pull/40) but it would be great to have it upstream but, it's n

[GitHub] [spark] cloud-fan opened a new pull request, #38511: [WIP][SPARK-41017][SQL] Do not push Filter through reference-only Project

2022-11-04 Thread GitBox

cloud-fan opened a new pull request, #38511: URL: https://github.com/apache/spark/pull/38511 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

[GitHub] [spark] HyukjinKwon closed pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox

HyukjinKwon closed pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect URL: https://github.com/apache/spark/pull/38462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] HyukjinKwon commented on pull request #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-04 Thread GitBox

HyukjinKwon commented on PR #38462: URL: https://github.com/apache/spark/pull/38462#issuecomment-1303686102 Merged to master. Let's address complete types in a followup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] HyukjinKwon closed pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client

2022-11-04 Thread GitBox

HyukjinKwon closed pull request #38485: [SPARK-41001] [CONNECT] [PYTHON] Implementing Connection String for Python Client URL: https://github.com/apache/spark/pull/38485 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dwsmith1983 commented on pull request #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

dwsmith1983 commented on PR #38510: URL: https://github.com/apache/spark/pull/38510#issuecomment-1303640935 @itholic I was going over another topic and made some updates on sql performance tuning as well. I added a screenshot of the markdown. This how you want it correct? -- This is an a

[GitHub] [spark] dwsmith1983 opened a new pull request, #38510: [MINOR][DOC] revisions for spark sql performance tuning to improve readability and grammar

2022-11-04 Thread GitBox

dwsmith1983 opened a new pull request, #38510: URL: https://github.com/apache/spark/pull/38510 ### What changes were proposed in this pull request? I made some small grammar fixes related to dependent clause followed but independent clauses, starting a sentence with an introdu

[GitHub] [spark] srielau commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-04 Thread GitBox

srielau commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r1014078710 ## core/src/main/resources/error/error-classes.json: ## @@ -668,6 +668,24 @@ } } }, + "LOCATION_ALREADY_EXISTS" : { +"message" : [ + "Cannot cr

[GitHub] [spark] srielau commented on a diff in pull request #38490: [SPARK-41009][SQL] Rename the error class `_LEGACY_ERROR_TEMP_1070` to `LOCATION_ALREADY_EXISTS`

2022-11-04 Thread GitBox

srielau commented on code in PR #38490: URL: https://github.com/apache/spark/pull/38490#discussion_r1014078710 ## core/src/main/resources/error/error-classes.json: ## @@ -668,6 +668,24 @@ } } }, + "LOCATION_ALREADY_EXISTS" : { +"message" : [ + "Cannot cr

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox

WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013973661 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return Column(sc._jvm.org.apache.spark.ml.functions.a

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox

WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013972337 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return Column(sc._jvm.org.apache.spark.ml.functions.a

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox

WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013971238 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return Column(sc._jvm.org.apache.spark.ml.functions.a

[GitHub] [spark] gaoyajun02 commented on a diff in pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-04 Thread GitBox

gaoyajun02 commented on code in PR #38333: URL: https://github.com/apache/spark/pull/38333#discussion_r1013969510 ## core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala: ## @@ -794,7 +794,15 @@ final class ShuffleBlockFetcherIterator( // s

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #37734: [SPARK-40264][ML] add batch_infer_udf function to pyspark.ml.functions

2022-11-04 Thread GitBox

WeichenXu123 commented on code in PR #37734: URL: https://github.com/apache/spark/pull/37734#discussion_r1013960675 ## python/pyspark/ml/functions.py: ## @@ -106,6 +117,542 @@ def array_to_vector(col: Column) -> Column: return Column(sc._jvm.org.apache.spark.ml.functions.a

1 2 >

1 - 100 of 125 matches

Mail list logo