liuzqt commented on PR #38064:
URL: https://github.com/apache/spark/pull/38064#issuecomment-1311348015
@mridulm I've tried `local-cluster[1,1,3072]` but doesn't help, I guess. Is
there any way to turn up the JVM mem in github action job?
--
This is an automated message from the Apache Gi
cloud-fan closed pull request #38604: [SPARK-41102][CONNECT] Merge
SparkConnectPlanner and SparkConnectCommandPlanner
URL: https://github.com/apache/spark/pull/38604
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
cloud-fan commented on PR #38604:
URL: https://github.com/apache/spark/pull/38604#issuecomment-1311347785
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific c
cloud-fan commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019959789
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(respons
LuciferYang commented on PR #38091:
URL: https://github.com/apache/spark/pull/38091#issuecomment-1311346776
@mridulm or call `TestUtils.configTestLog4j2("DEBUG")` before this test
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHu
LuciferYang commented on PR #38091:
URL: https://github.com/apache/spark/pull/38091#issuecomment-1311343738
Maybe we can modify `src/test/resources/log4j2.properties` print all logs to
stdout?
--
This is an automated message from the Apache Git Service.
To respond to the message, please
LuciferYang commented on PR #38620:
URL: https://github.com/apache/spark/pull/38620#issuecomment-1311333706
test first
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To un
LuciferYang opened a new pull request, #38620:
URL: https://github.com/apache/spark/pull/38620
### What changes were proposed in this pull request?
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
### How was thi
ulysses-you commented on PR #38619:
URL: https://github.com/apache/spark/pull/38619#issuecomment-1311331866
cc @wangyum @cloud-fan @sigmod thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
ulysses-you commented on code in PR #38619:
URL: https://github.com/apache/spark/pull/38619#discussion_r1019946997
##
sql/core/src/test/scala/org/apache/spark/sql/InjectRuntimeFilterSuite.scala:
##
@@ -257,6 +257,11 @@ class InjectRuntimeFilterSuite extends QueryTest with
SQLTe
ulysses-you opened a new pull request, #38619:
URL: https://github.com/apache/spark/pull/38619
### What changes were proposed in this pull request?
Apply ColumnPruning for in subquery filter.
Note that, the bloom filter side has already fixed by
https://github.com/apach
Ngone51 commented on PR #38064:
URL: https://github.com/apache/spark/pull/38064#issuecomment-1311315439
Should the PR title be changed to something like "Remove the limitation of a
single task result must fit in 2GB"?
--
This is an automated message from the Apache Git Service.
To respond
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311311848
Can you merge this if the tests pass @HyukjinKwon ? I might not be online
tomorrow and it is getting late tonight for me :-)
--
This is an automated message from the Apache Git Service.
HyukjinKwon commented on code in PR #38616:
URL: https://github.com/apache/spark/pull/38616#discussion_r1019909673
##
python/pyspark/sql/connect/dataframe.py:
##
@@ -143,6 +143,17 @@ def columns(self) -> List[str]:
return self.schema().names
+def sparkSession(se
HyukjinKwon commented on PR #38468:
URL: https://github.com/apache/spark/pull/38468#issuecomment-1311310134
Made another PR to refactor and deduplicate the Arrow codes PTAL:
https://github.com/apache/spark/pull/38618
--
This is an automated message from the Apache Git Service.
To respond
HyukjinKwon opened a new pull request, #38618:
URL: https://github.com/apache/spark/pull/38618
### What changes were proposed in this pull request?
This PR is a followup of both https://github.com/apache/spark/pull/38468 and
https://github.com/apache/spark/pull/38612 that proposes to
beatbull commented on PR #33828:
URL: https://github.com/apache/spark/pull/33828#issuecomment-1311299286
Hi, sadly this PR got closed (automatically due to inactivity). We'd be
interested in this feature & config option since the ".spark-staging-*" folders
are causing trouble e.g. when usin
panbingkun commented on code in PR #38555:
URL: https://github.com/apache/spark/pull/38555#discussion_r1019880123
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala:
##
@@ -66,7 +66,13 @@ case class WindowSpecDefinition(
override
MaxGekk closed pull request #38582: [SPARK-41095][SQL] Convert unresolved
operators to internal errors
URL: https://github.com/apache/spark/pull/38582
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
MaxGekk commented on PR #38582:
URL: https://github.com/apache/spark/pull/38582#issuecomment-1311288176
Merging to master. Thank you, @cloud-fan and @LuciferYang for review.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
amaliujia commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019878719
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(respons
MaxGekk closed pull request #38572: [SPARK-41059][SQL] Rename
`_LEGACY_ERROR_TEMP_2420` to `NESTED_AGGREGATE_FUNCTION`
URL: https://github.com/apache/spark/pull/38572
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
panbingkun commented on code in PR #38555:
URL: https://github.com/apache/spark/pull/38555#discussion_r1019878522
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala:
##
@@ -57,16 +58,17 @@ case class WindowSpecDefinition(
fram
panbingkun commented on code in PR #38555:
URL: https://github.com/apache/spark/pull/38555#discussion_r1019878119
##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala:
##
@@ -66,7 +66,13 @@ case class WindowSpecDefinition(
override
MaxGekk commented on PR #38572:
URL: https://github.com/apache/spark/pull/38572#issuecomment-1311286385
+1, LGTM. Merging to master.
Thank you, @itholic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
panbingkun commented on code in PR #38555:
URL: https://github.com/apache/spark/pull/38555#discussion_r1019877660
##
core/src/main/resources/error/error-classes.json:
##
@@ -219,6 +219,11 @@
"Input to the function cannot contain elements of the
\"MAP\" type. In Spar
LuciferYang commented on code in PR #38609:
URL: https://github.com/apache/spark/pull/38609#discussion_r1019875427
##
project/SparkBuild.scala:
##
@@ -109,6 +109,16 @@ object SparkBuild extends PomBuild {
if (profiles.contains("jdwp-test-debug")) {
sys.props.put("tes
LuciferYang commented on code in PR #38609:
URL: https://github.com/apache/spark/pull/38609#discussion_r1019875427
##
project/SparkBuild.scala:
##
@@ -109,6 +109,16 @@ object SparkBuild extends PomBuild {
if (profiles.contains("jdwp-test-debug")) {
sys.props.put("tes
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019866602
##
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##
@@ -4559,8 +4564,8 @@ class DAGSchedulerSuite extends SparkFunSuite with
TempLocalSparkCo
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311274093
I am still not able to reproduce this locally - but logically, this looks
like the right fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, ple
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019866602
##
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##
@@ -4559,8 +4564,8 @@ class DAGSchedulerSuite extends SparkFunSuite with
TempLocalSparkCo
mridulm commented on PR #38617:
URL: https://github.com/apache/spark/pull/38617#issuecomment-1311267937
+CC @HyukjinKwon, @LuciferYang, @wankunde
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019860399
##
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##
@@ -4559,8 +4563,8 @@ class DAGSchedulerSuite extends SparkFunSuite with
TempLocalSparkCo
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019860144
##
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##
@@ -4533,16 +4533,20 @@ class DAGSchedulerSuite extends SparkFunSuite with
TempLocalSpark
mridulm commented on code in PR #38617:
URL: https://github.com/apache/spark/pull/38617#discussion_r1019859895
##
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##
@@ -4533,16 +4533,20 @@ class DAGSchedulerSuite extends SparkFunSuite with
TempLocalSpark
mridulm opened a new pull request, #38617:
URL: https://github.com/apache/spark/pull/38617
### What changes were proposed in this pull request?
Fix flakey test failure
### Why are the changes needed?
MT-safety issue in test
### Does this PR introduce _any_ user-facing chan
rangadi commented on code in PR #38603:
URL: https://github.com/apache/spark/pull/38603#discussion_r1019851951
##
python/pyspark/sql/protobuf/functions.py:
##
@@ -32,7 +32,7 @@
def from_protobuf(
data: "ColumnOrName",
messageName: str,
-descFilePath: str,
+des
rangadi commented on code in PR #38603:
URL: https://github.com/apache/spark/pull/38603#discussion_r1019850935
##
python/pyspark/sql/protobuf/functions.py:
##
@@ -48,8 +48,11 @@ def from_protobuf(
--
data : :class:`~pyspark.sql.Column` or str
the binar
rangadi commented on code in PR #38603:
URL: https://github.com/apache/spark/pull/38603#discussion_r1019850935
##
python/pyspark/sql/protobuf/functions.py:
##
@@ -48,8 +48,11 @@ def from_protobuf(
--
data : :class:`~pyspark.sql.Column` or str
the binar
amaliujia opened a new pull request, #38616:
URL: https://github.com/apache/spark/pull/38616
### What changes were proposed in this pull request?
This PR implements `DataFrame.sparkSession` in Python client. The only
difference between this API and the one in PySpark is that t
zhengchenyu commented on PR #37949:
URL: https://github.com/apache/spark/pull/37949#issuecomment-1311254076
@xkrogen Thanks for your review. In our cluster, YARN_CONF_DIR is same
with HADOOP_CONF_DIR.
SparkHadoopUtil.newConfiguration is different from
SparkHadoopUtil.get.newConfig
mridulm commented on PR #38091:
URL: https://github.com/apache/spark/pull/38091#issuecomment-1311254131
Unfortunately, I did not find the unit test log files in this - based on
local build, it is at `core/target/unit-tests.log`
Is there a way to get to this @HyukjinKwon ? Thanks !
--
T
HyukjinKwon commented on PR #38612:
URL: https://github.com/apache/spark/pull/38612#issuecomment-1311251418
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
HyukjinKwon closed pull request #38612: [SPARK-41108][CONNECT] Control the max
size of arrow batch
URL: https://github.com/apache/spark/pull/38612
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
HyukjinKwon commented on PR #38612:
URL: https://github.com/apache/spark/pull/38612#issuecomment-1311251340
Let me actually merge and refactor this out. I am working on it actually.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to Git
panbingkun opened a new pull request, #38615:
URL: https://github.com/apache/spark/pull/38615
### What changes were proposed in this pull request?
In the PR, I propose to rename the legacy error class
_LEGACY_ERROR_TEMP_1216 to INVALID_LIKE_PATTERN.
### Why are the changes needed?
zhengruifeng commented on code in PR #38612:
URL: https://github.com/apache/spark/pull/38612#discussion_r1019843494
##
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala:
##
@@ -161,17 +166,23 @@ private[sql] object ArrowConverters extends Logging
amaliujia commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019842183
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(respons
HyukjinKwon commented on PR #38091:
URL: https://github.com/apache/spark/pull/38091#issuecomment-1311241568
https://pipelines.actions.githubusercontent.com/serviceHosts/03398d36-4378-4d47-a936-fba0a5e8ccb9/_apis/pipelines/1/runs/194716/signedlogcontent/21?urlExpires=2022-11-11T05%3A16%3A59.8
cloud-fan commented on code in PR #38595:
URL: https://github.com/apache/spark/pull/38595#discussion_r1019839623
##
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##
@@ -3804,6 +3804,13 @@ class Dataset[T] private[sql](
} catch {
case _: ParseException =
HyukjinKwon commented on PR #38609:
URL: https://github.com/apache/spark/pull/38609#issuecomment-1311239503
cc @grundprinzip @amaliujia FYI
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the spe
zhengruifeng commented on PR #38614:
URL: https://github.com/apache/spark/pull/38614#issuecomment-1311239224
close this PR in favor of https://github.com/apache/spark/pull/38613
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
zhengruifeng closed pull request #38614: [SPARK-41005][CONNECT][FOLLOWUP]
Collect should use `submitJob` instead of `runJob`
URL: https://github.com/apache/spark/pull/38614
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use
mridulm commented on PR #38091:
URL: https://github.com/apache/spark/pull/38091#issuecomment-1311239134
Same here @LuciferYang, I am not able to reproduce it locally.
@HyukjinKwon, is there a way to get to the surefire-reports log files from
CI ?
--
This is an automated message from th
HyukjinKwon commented on PR #38614:
URL: https://github.com/apache/spark/pull/38614#issuecomment-1311238885
https://github.com/apache/spark/pull/38613 will handle this actually. Let's
leave this closed.
--
This is an automated message from the Apache Git Service.
To respond to the message
HyukjinKwon commented on code in PR #38612:
URL: https://github.com/apache/spark/pull/38612#discussion_r1019838326
##
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala:
##
@@ -161,17 +166,23 @@ private[sql] object ArrowConverters extends Logging
cloud-fan commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r101983
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(respons
amaliujia commented on code in PR #38595:
URL: https://github.com/apache/spark/pull/38595#discussion_r1019836218
##
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##
@@ -3804,6 +3804,13 @@ class Dataset[T] private[sql](
} catch {
case _: ParseException =
cloud-fan commented on code in PR #38595:
URL: https://github.com/apache/spark/pull/38595#discussion_r1019835252
##
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##
@@ -3804,6 +3804,13 @@ class Dataset[T] private[sql](
} catch {
case _: ParseException =
cloud-fan commented on code in PR #38595:
URL: https://github.com/apache/spark/pull/38595#discussion_r1019835019
##
sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala:
##
@@ -3804,6 +3804,13 @@ class Dataset[T] private[sql](
} catch {
case _: ParseException =
HyukjinKwon commented on PR #38613:
URL: https://github.com/apache/spark/pull/38613#issuecomment-1311232973
It collects all results first because of synced `runJob` that waits all
results to arrive.
--
This is an automated message from the Apache Git Service.
To respond to the message, pl
amaliujia commented on code in PR #38595:
URL: https://github.com/apache/spark/pull/38595#discussion_r1019834165
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##
@@ -1135,21 +1135,27 @@ class DatasetSuite extends QueryTest
}
test("createTempView") {
cloud-fan commented on PR #38613:
URL: https://github.com/apache/spark/pull/38613#issuecomment-1311232475
> Previously, it actually waits until all results are stored all first
Really? I think the best case is also sending partitions one by one.
Anyway, this PR looks good as it
zhengruifeng commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019833916
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -184,9 +158,30 @@ class SparkConnectStreamHandler(r
HyukjinKwon commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019833127
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -184,9 +158,30 @@ class SparkConnectStreamHandler(re
HyukjinKwon commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019831985
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -184,9 +158,30 @@ class SparkConnectStreamHandler(re
HyukjinKwon commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019831985
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -184,9 +158,30 @@ class SparkConnectStreamHandler(re
zhengruifeng commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019830407
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -184,9 +158,30 @@ class SparkConnectStreamHandler(r
zhengruifeng commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019829685
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -184,9 +158,30 @@ class SparkConnectStreamHandler(r
amaliujia commented on code in PR #38607:
URL: https://github.com/apache/spark/pull/38607#discussion_r1019829371
##
python/pyspark/sql/connect/plan.py:
##
@@ -712,6 +712,8 @@ def __init__(self, child: Optional["LogicalPlan"], alias:
str) -> None:
def plan(self, session:
HyukjinKwon commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019826331
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -56,7 +56,7 @@ class SparkConnectStreamHandler(respo
zhengruifeng commented on PR #38614:
URL: https://github.com/apache/spark/pull/38614#issuecomment-1311222793
thanks @HyukjinKwon for pointing it out.
also cc @hvanhovell
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
zhengruifeng opened a new pull request, #38614:
URL: https://github.com/apache/spark/pull/38614
### What changes were proposed in this pull request?
use `submitJob` instead of `runJob`
### Why are the changes needed?
`spark.sparkContext.runJob` is blocked until finishes all p
cloud-fan commented on code in PR #38613:
URL: https://github.com/apache/spark/pull/38613#discussion_r1019825048
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -56,7 +56,7 @@ class SparkConnectStreamHandler(respons
amaliujia commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019820897
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -39,14 +46,17 @@ final case class InvalidPlanInput(
pri
HyukjinKwon commented on code in PR #38468:
URL: https://github.com/apache/spark/pull/38468#discussion_r1019820867
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -114,10 +120,93 @@ class SparkConnectStreamHandler(r
amaliujia commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019820702
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(respons
HyukjinKwon opened a new pull request, #38613:
URL: https://github.com/apache/spark/pull/38613
### What changes were proposed in this pull request?
This PR is a followup of https://github.com/apache/spark/pull/38468 that
proposes to remove notify-wait approach, and introduce a new way
amaliujia commented on code in PR #38595:
URL: https://github.com/apache/spark/pull/38595#discussion_r1019816022
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##
@@ -1135,21 +1135,27 @@ class DatasetSuite extends QueryTest
}
test("createTempView") {
amaliujia commented on PR #38606:
URL: https://github.com/apache/spark/pull/38606#issuecomment-1311202518
@cloud-fan
We need to a bit more discussion on when to use `optional`. Right now the
most obvious usage is to replace those `message` wrap.
One example is, if a field is r
zhengruifeng commented on code in PR #38612:
URL: https://github.com/apache/spark/pull/38612#discussion_r1019806473
##
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala:
##
@@ -161,17 +166,23 @@ private[sql] object ArrowConverters extends Logging
LuciferYang commented on PR #38609:
URL: https://github.com/apache/spark/pull/38609#issuecomment-1311196281
Let me finish the sbt part first
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t
LuciferYang commented on PR #38609:
URL: https://github.com/apache/spark/pull/38609#issuecomment-1311194812
User need to manually compile `protoc-xxx-linux-x86_64.exe` and
`protoc-gen-grpc-java-1.47.0-linux-x86_64.exe` can executable on
`CentOs6&CentOs7`.
Or pre install the library t
zhengruifeng opened a new pull request, #38612:
URL: https://github.com/apache/spark/pull/38612
### What changes were proposed in this pull request?
Control the max size of arrow batch
### Why are the changes needed?
as per the suggestion
https://github.com/apache/sp
zhengchenyu commented on PR #37949:
URL: https://github.com/apache/spark/pull/37949#issuecomment-1311193343
@xkrogen Thanks for your review. In our cluster, YARN_CONF_DIR is same with
HADOOP_CONF_DIR.
I add some key information about the failed application.
```
# some key i
yabola commented on PR #38560:
URL: https://github.com/apache/spark/pull/38560#issuecomment-1311193090
my latest implementation no longer passes reduceIds from driver, there are
still some code style improvements, just some rough implementation for now
--
This is an automated message from
pan3793 commented on PR #38596:
URL: https://github.com/apache/spark/pull/38596#issuecomment-1311188998
This patch only suitable to master.
- branch-3.2 and earlier use the fat netty-all, no issue;
- branch-3.3 depends on netty 4.1.74, which claims `netty-tcnative-classes`
as compi
xinrong-meng opened a new pull request, #38611:
URL: https://github.com/apache/spark/pull/38611
### What changes were proposed in this pull request?
Install [memory-profiler](https://pypi.org/project/memory-profiler/) in CI
in order to enable memory profiling tests.
### Why are the
HyukjinKwon commented on PR #38599:
URL: https://github.com/apache/spark/pull/38599#issuecomment-1311184802
Sorry actually I am reverting this. Seems like it's related .. surprisingly
..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on
vinodkc commented on PR #38608:
URL: https://github.com/apache/spark/pull/38608#issuecomment-1311184404
CC @cloud-fan , @HyukjinKwon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
zhengruifeng commented on PR #38546:
URL: https://github.com/apache/spark/pull/38546#issuecomment-1311182989
merged into master
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific commen
zhengruifeng closed pull request #38546: [SPARK-41036][CONNECT][PYTHON]
`columns` API should use `schema` API to avoid data fetching
URL: https://github.com/apache/spark/pull/38546
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
SandishKumarHN commented on code in PR #38603:
URL: https://github.com/apache/spark/pull/38603#discussion_r1019796098
##
python/pyspark/sql/protobuf/functions.py:
##
@@ -49,7 +49,10 @@ def from_protobuf(
data : :class:`~pyspark.sql.Column` or str
the binary column.
HyukjinKwon commented on PR #38609:
URL: https://github.com/apache/spark/pull/38609#issuecomment-1311181590
How do we get the user-defined protobuf executables for
`CONNECT_PROTOC_EXEC_PATH` and `CONNECT_PLUGIN_EXEC_PATH` in CentOS 6 and 7? If
this is the only way, I am fine but we should p
cloud-fan commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019794236
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamHandler.scala:
##
@@ -50,9 +49,9 @@ class SparkConnectStreamHandler(respons
HyukjinKwon commented on code in PR #38607:
URL: https://github.com/apache/spark/pull/38607#discussion_r1019793828
##
python/pyspark/sql/connect/plan.py:
##
@@ -712,6 +712,8 @@ def __init__(self, child: Optional["LogicalPlan"], alias:
str) -> None:
def plan(self, session
cloud-fan commented on code in PR #38604:
URL: https://github.com/apache/spark/pull/38604#discussion_r1019793647
##
connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##
@@ -39,14 +46,17 @@ final case class InvalidPlanInput(
pri
yabola commented on PR #38560:
URL: https://github.com/apache/spark/pull/38560#issuecomment-1311176907
@mridulm Yes...These two issues are the similar. @wankunde Can I continue
editing my PR in this Issue?
--
This is an automated message from the Apache Git Service.
To respond to the mess
HyukjinKwon commented on code in PR #38603:
URL: https://github.com/apache/spark/pull/38603#discussion_r1019792541
##
python/pyspark/sql/protobuf/functions.py:
##
@@ -49,7 +49,10 @@ def from_protobuf(
data : :class:`~pyspark.sql.Column` or str
the binary column.
cloud-fan commented on PR #38606:
URL: https://github.com/apache/spark/pull/38606#issuecomment-1311175860
There are still some fields documented as optional but don't use the
`optional` keywork. Do we need to change them?
--
This is an automated message from the Apache Git Service.
To res
1 - 100 of 314 matches
Mail list logo