LuciferYang commented on code in PR #40352:
URL: https://github.com/apache/spark/pull/40352#discussion_r1145683804
##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##
@@ -584,6 +585,97 @@ final class DataFrameStatFunctions
yaooqinn opened a new pull request, #40531:
URL: https://github.com/apache/spark/pull/40531
### What changes were proposed in this pull request?
Add type mapping for spark char/varchar to jdbc types.
### Why are the changes needed?
The STANDARD JDBC 1.0 and
LuciferYang commented on code in PR #40438:
URL: https://github.com/apache/spark/pull/40438#discussion_r1145682342
##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala:
##
@@ -129,6 +130,9 @@ object
grundprinzip commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145679097
##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##
@@ -53,19 +59,37 @@ class SparkConnectService(debug:
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480595916
> @shrprasa do you know how the case 1 works?
yes. It works because the resolved column has just one match
attributes: Vector(id#17)
but for second case, the match
LuciferYang commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480591952
Thanks @HyukjinKwon @dongjoon-hyun @ueshin
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
HyukjinKwon opened a new pull request, #40530:
URL: https://github.com/apache/spark/pull/40530
### What changes were proposed in this pull request?
This PR proposes to remove None as as a return value in docstring.
### Why are the changes needed?
To be consistent with
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480589254
@shrprasa do you know how the case 1 works?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
yaooqinn commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480574647
@shrprasa
At the dataset definition phase, especially for intermediate datasets, Spark
is lenient/lazy with case sensitivity. This is because the checks happen in SQL
Analyzing,
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480574606
> df3.select("id").show()
@cloud-fan The example you have shared will behave the same even after this
fix. It will give ambiguous error.
The use case which the fix is trying
zhengruifeng commented on PR #40355:
URL: https://github.com/apache/spark/pull/40355#issuecomment-1480570911
ping @hvanhovell @zhenlineo
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
HyukjinKwon closed pull request #40518: [SPARK-42901][CONNECT][PYTHON] Move
`StorageLevel` into a separate file to avoid potential `file recursively
imports`
URL: https://github.com/apache/spark/pull/40518
--
This is an automated message from the Apache Git Service.
To respond to the
HyukjinKwon commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480558008
Merged to master and branch-3.4.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
LuciferYang commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480556858
GA passed ~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
HyukjinKwon commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480556026
It has a conflict w/ branch-3.4. mind creating a backport PR please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
HyukjinKwon closed pull request #40487: [SPARK-42891][CONNECT][PYTHON]
Implement CoGrouped Map API
URL: https://github.com/apache/spark/pull/40487
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
HyukjinKwon commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480555708
Merged to master and branch-3.4.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145632881
##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##
@@ -53,19 +59,37 @@ class SparkConnectService(debug:
HyukjinKwon commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1145632491
##
python/pyspark/sql/pandas/map_ops.py:
##
@@ -60,6 +62,7 @@ def mapInPandas(
schema : :class:`pyspark.sql.types.DataType` or str
the return
HyukjinKwon commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1145632294
##
python/pyspark/sql/pandas/map_ops.py:
##
@@ -32,7 +32,9 @@ class PandasMapOpsMixin:
"""
def mapInPandas(
-self, func:
yliou opened a new pull request, #40529:
URL: https://github.com/apache/spark/pull/40529
### What changes were proposed in this pull request?
On the SQL page in the Web UI, this PR aims to add a repeat identifier to
distinguish which InMemoryTableScan is being used at a certain
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480540953
I think column resolution should only look at one level, to make the
behavior simple and predictable. I tried it on pgsql and it fails as well:
```
create table t(i int);
chong0929 commented on PR #40521:
URL: https://github.com/apache/spark/pull/40521#issuecomment-1480537486
Thanks for your review. I think some examples that cannot be linked to the
correct places, which can make confused, and the original points provides some
clear references.
--
This
cloud-fan commented on PR #40526:
URL: https://github.com/apache/spark/pull/40526#issuecomment-1480536077
late LGTM
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
cloud-fan commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480533198
From a SQL engine's point of view, running all tasks at once or batch by
batch doesn't matter. It doesn't change the semantics of the SQL operator, and
the optimizer doesn't care about
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480532873
> I second @srowen ‘s view. cc @cloud-fan
Thanks @yaooqinn for replying. Can you please explain why you think it's not
the right fix?
The fix only proposes to remove
cxzl25 commented on PR #40439:
URL: https://github.com/apache/spark/pull/40439#issuecomment-1480532626
@HeartSaVioR Please help review this PR, Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
beliefer commented on code in PR #40528:
URL: https://github.com/apache/spark/pull/40528#discussion_r1145613769
##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala:
##
@@ -1213,11 +1213,8 @@ class Column private[sql] (private[sql] val expr:
zhengruifeng commented on PR #40521:
URL: https://github.com/apache/spark/pull/40521#issuecomment-1480513582
I think it is fine if we don't have avaliable ticket link.
It seems that those links point to issues before moving Kolas to Apache
Spark.
--
This is an automated message from
WeichenXu123 commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480510947
> I am saying that real power of Catalyst optimizer is to optimize/reorder
these logical plans, and I believe that's the reason why barrier execution
wasn't introduced in SQL. The
beliefer commented on PR #40528:
URL: https://github.com/apache/spark/pull/40528#issuecomment-1480509383
This PR has not been implemented yet.
@hvanhovell Could you take a look? Does this one satisfy your expected.
--
This is an automated message from the Apache Git Service.
To respond
beliefer opened a new pull request, #40528:
URL: https://github.com/apache/spark/pull/40528
### What changes were proposed in this pull request?
Currently, connect display the structure of the proto in both the regular
and extended version of explain. We should display a more compact
shrprasa commented on PR #40128:
URL: https://github.com/apache/spark/pull/40128#issuecomment-1480495550
@dongjoon-hyun Thanks for the clarification. But the unreliability for
shutdown hook is common for all other shutdown tasks also. This doesn't mean we
haven't impletened them. So, why
zhengruifeng commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480490580
> Barrier mode is only used in specific ML case, i.e. in model training
routine, we will only use it in one pattern:
>
> dataset.mapInPandas(..., is_barrier=True).collect()
HyukjinKwon closed pull request #40526: [SPARK-42899][SQL] Fix
DataFrame.to(schema) to handle the case where there is a non-nullable nested
field in a nullable field
URL: https://github.com/apache/spark/pull/40526
--
This is an automated message from the Apache Git Service.
To respond to
HyukjinKwon commented on PR #40526:
URL: https://github.com/apache/spark/pull/40526#issuecomment-1480485696
Merged to master and branch-3.4.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
zhengruifeng commented on PR #40527:
URL: https://github.com/apache/spark/pull/40527#issuecomment-1480472824
also cc @WeichenXu123
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
zhengruifeng commented on PR #40519:
URL: https://github.com/apache/spark/pull/40519#issuecomment-1480471132
thanks for reivews
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
yaooqinn commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480469070
I second @srowen ‘s view. cc @cloud-fan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
ulysses-you commented on PR #40522:
URL: https://github.com/apache/spark/pull/40522#issuecomment-1480466232
lgtm
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To
zhenlineo commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145575350
##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##
@@ -53,19 +59,37 @@ class SparkConnectService(debug:
zhenlineo commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145575350
##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##
@@ -53,19 +59,37 @@ class SparkConnectService(debug:
HyukjinKwon commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480444902
I am saying that real power of Catalyst optimizer is to optimize/reorder
these logical plans, and I believe that's the reason why barrier execution
wasn't introduced in SQL. But
HyukjinKwon commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480443966
Predicate pushdown is just an example. e.g., you might want to combine
adjacent `MapInPandas`s but it would need a special handling if `is_barrier`
flag is added.
--
This is an
amaliujia commented on code in PR #40498:
URL: https://github.com/apache/spark/pull/40498#discussion_r1145563490
##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:
##
@@ -458,7 +458,9 @@ class DataFrameReader private[sql] (sparkSession:
gerashegalov commented on code in PR #40515:
URL: https://github.com/apache/spark/pull/40515#discussion_r1145546706
##
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientBuilderParseTestSuite.scala:
##
@@ -0,0 +1,131 @@
+/*
+ *
LuciferYang commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480438169
> @LuciferYang . This looks worthy of having a new JIRA. Please create a new
JIRA for this PR and use it. This PR is a good contribution of yours.
@dongjoon-hyun Thanks for
LuciferYang commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480437248
> @LuciferYang nit: you need to update the PR description. There is an old
file name `storage_level.proto`.
Thanks ~ fixed
--
This is an automated message from the Apache
WeichenXu123 commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480433501
> hmmm why do we need to care about the optimizer? The optimizer is not
sensitive to the physical execution engine, e.g. Preso, Spark, Flink have many
similar SQL optimizations.
cloud-fan commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480433026
hmmm why do we need to care about the optimizer? The optimizer is not
sensitive to the physical execution engine, e.g. Preso, Spark, Flink have many
similar SQL optimizations.
--
WeichenXu123 commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480428307
To address @HyukjinKwon 's concern about optimizer,
can we add `is_barrier` attribute into `UnaryExecNode`,
and if optimizer find a node marking `is_barrier` as True, then
WeichenXu123 commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1145545786
##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/MapInPandasExec.scala:
##
@@ -28,7 +28,8 @@ import org.apache.spark.sql.execution.SparkPlan
case
github-actions[bot] commented on PR #38781:
URL: https://github.com/apache/spark/pull/38781#issuecomment-1480416870
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #39023:
URL: https://github.com/apache/spark/pull/39023#issuecomment-1480416829
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #38965:
URL: https://github.com/apache/spark/pull/38965#issuecomment-1480416849
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
github-actions[bot] commented on PR #38756:
URL: https://github.com/apache/spark/pull/38756#issuecomment-1480416904
We're closing this PR because it hasn't been updated in a while. This isn't
a judgement on the merit of the PR in any way. It's just a way of keeping the
PR queue manageable.
HyukjinKwon commented on PR #40521:
URL: https://github.com/apache/spark/pull/40521#issuecomment-1480411932
Those are actually not real JIRAs or TODOs. These are the pointers of the
original fix or ticket (that contains examples or code change). So I guess it's
fine as is.
--
This is an
hvanhovell closed pull request #40368: [SPARK-42748][CONNECT] Server-side
Artifact Management
URL: https://github.com/apache/spark/pull/40368
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
hvanhovell commented on PR #40368:
URL: https://github.com/apache/spark/pull/40368#issuecomment-1480408251
Merging this one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
ueshin commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480408180
@LuciferYang nit: you need to update the PR description. There is an old
file name `storage_level.proto`.
--
This is an automated message from the Apache Git Service.
To respond to the
LuciferYang commented on PR #40516:
URL: https://github.com/apache/spark/pull/40516#issuecomment-1480402878
Thanks @dongjoon-hyun @HyukjinKwon
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
dongjoon-hyun commented on PR #40128:
URL: https://github.com/apache/spark/pull/40128#issuecomment-1480402673
@shrprasa .
1. It seems that you have an assumption that Shutdown hook is magically
reliable. However, shutdown hook has a well-known limitation where JVM can be
destroyed
LuciferYang commented on PR #40518:
URL: https://github.com/apache/spark/pull/40518#issuecomment-1480402574
rebase due to https://github.com/apache/spark/pull/40516 merged
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
zhenlineo commented on code in PR #40498:
URL: https://github.com/apache/spark/pull/40498#discussion_r1145531822
##
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala:
##
@@ -458,7 +458,9 @@ class DataFrameReader private[sql] (sparkSession:
hvanhovell commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480397842
@dongjoon-hyun I send the email.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
dongjoon-hyun commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480398301
Thank you so much, @hvanhovell .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
amaliujia commented on code in PR #40498:
URL: https://github.com/apache/spark/pull/40498#discussion_r1145530854
##
python/pyspark/sql/connect/plan.py:
##
@@ -302,13 +302,16 @@ def plan(self, session: "SparkConnectClient") ->
proto.Relation:
class Read(LogicalPlan):
-
xinrong-meng commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480372625
cc [LuciferYang](https://github.com/LuciferYang) thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
xinrong-meng commented on PR #40487:
URL: https://github.com/apache/spark/pull/40487#issuecomment-1480372425
May I get a review please @zhengruifeng @HyukjinKwon ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
ueshin commented on PR #40402:
URL: https://github.com/apache/spark/pull/40402#issuecomment-1480349942
@zhengruifeng I submitted two PRs: #40526 and #40527.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
ueshin opened a new pull request, #40527:
URL: https://github.com/apache/spark/pull/40527
### What changes were proposed in this pull request?
Fixes `createDataFrame` to respect inference and column names.
### Why are the changes needed?
Currently when a column name list
ueshin opened a new pull request, #40526:
URL: https://github.com/apache/spark/pull/40526
### What changes were proposed in this pull request?
Fixes `DataFrame.to(schema)` to handle the case where there is a
non-nullable nested field in a nullable field.
### Why are the
hvanhovell closed pull request #40512: [SPARK-42892][SQL] Move sameType and
relevant methods out of DataType
URL: https://github.com/apache/spark/pull/40512
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480217186
Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please
review this PR or direct it to someone who can review this PR.
--
This is an automated message from the
shrprasa commented on PR #40128:
URL: https://github.com/apache/spark/pull/40128#issuecomment-1480215990
Hi @dongjoon-hyun
The change to clean up the upload directory is not specific to HDFS. The
reason we should do cleanup is because if the spark job is creating new
directories/files,
itholic commented on PR #40525:
URL: https://github.com/apache/spark/pull/40525#issuecomment-1480119115
The remaining task at hand is to address numerous mypy annotation issues. If
you have any good ideas for resolving linter, please feel free to let me know
at any time :-)
--
This is
itholic opened a new pull request, #40525:
URL: https://github.com/apache/spark/pull/40525
### What changes were proposed in this pull request?
This PR proposes to support pandas API on Spark for Spark Connect. This PR
includes minimal changes to support basic functionality of the
cnauroth commented on PR #40511:
URL: https://github.com/apache/spark/pull/40511#issuecomment-1480084498
@dongjoon-hyun and @sunchao , thank you for the commit and the warm welcome!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
gerashegalov commented on PR #40524:
URL: https://github.com/apache/spark/pull/40524#issuecomment-1480055444
LGTM, I would just add a unit test to CastSuite to prevent regressions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
dongjoon-hyun commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480034164
Ya, I'm not against this nice improvement. Just shoot one email to the dev
mailing list to give a headup. That's what I'm thinking that we need.
--
This is an automated message
hvanhovell commented on PR #40515:
URL: https://github.com/apache/spark/pull/40515#issuecomment-1480012982
@dongjoon-hyun officially is a bit a broad term. As far as I am concerned
ammonite is just a way to use the connect JVM client, it is not meant as a
change for all of Spark (although
ueshin commented on code in PR #40518:
URL: https://github.com/apache/spark/pull/40518#discussion_r1145188951
##
connector/connect/common/src/main/protobuf/spark/connect/storage_level.proto:
##
@@ -0,0 +1,37 @@
+/*
Review Comment:
`common.proto` sounds good to me.
--
dongjoon-hyun commented on PR #40462:
URL: https://github.com/apache/spark/pull/40462#issuecomment-1479987729
Merged to master for Apache Spark 3.5.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
dongjoon-hyun closed pull request #40462: [SPARK-42832][SQL] Remove repartition
if it is the child of LocalLimit
URL: https://github.com/apache/spark/pull/40462
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
dongjoon-hyun commented on PR #40519:
URL: https://github.com/apache/spark/pull/40519#issuecomment-1479940012
branch-3.4 is handled via https://github.com/apache/spark/pull/40500
yesterday.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log
dongjoon-hyun commented on PR #40519:
URL: https://github.com/apache/spark/pull/40519#issuecomment-1479933229
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
dongjoon-hyun closed pull request #40519: [SPARK-42864][ML] Make
`IsotonicRegression.PointsAccumulator` private
URL: https://github.com/apache/spark/pull/40519
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL
dongjoon-hyun closed pull request #40516: [SPARK-42894][CONNECT] Support
`cache`/`persist`/`unpersist`/`storageLevel` for Spark connect jvm client
URL: https://github.com/apache/spark/pull/40516
--
This is an automated message from the Apache Git Service.
To respond to the message, please
dongjoon-hyun commented on PR #40516:
URL: https://github.com/apache/spark/pull/40516#issuecomment-1479928027
Merged to master/3.4.
Thank you, @LuciferYang and @HyukjinKwon .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
VindhyaG commented on PR #40462:
URL: https://github.com/apache/spark/pull/40462#issuecomment-1479847214
> > Can you please explain more on scenarios when rebalancepartitions
becomes child of locallimit? i tried SELECT * FROM t WHERE id > 1 LIMIT 5; with
spark 2.4.4 version and
revans2 opened a new pull request, #40524:
URL: https://github.com/apache/spark/pull/40524
### What changes were proposed in this pull request?
This removes the need for a time zone id when casting from StringType ->
DateType and DateType -> StringType.
### Why are the changes
wankunde opened a new pull request, #40523:
URL: https://github.com/apache/spark/pull/40523
### What changes were proposed in this pull request?
For example:
```
val df1 = spark.range(5).select($"id".as("k1"))
val df2 = spark.range(10).select($"id".as("k2"))
cloud-fan closed pull request #40446: [SPARK-42815][SQL] Subexpression
elimination support shortcut expression
URL: https://github.com/apache/spark/pull/40446
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
cloud-fan commented on PR #40446:
URL: https://github.com/apache/spark/pull/40446#issuecomment-1479648779
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
panbingkun commented on PR #40506:
URL: https://github.com/apache/spark/pull/40506#issuecomment-1479624519
> hmm... I think we should refactor `JsonBenchmark` to make get_json_object
run w/ and w/o code gen in one
Ok, Let me do it.
--
This is an automated message from the Apache
cloud-fan commented on PR #40522:
URL: https://github.com/apache/spark/pull/40522#issuecomment-1479609724
cc @ulysses-you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
WeichenXu123 commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1144852369
##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/MapInPandasExec.scala:
##
@@ -28,7 +28,8 @@ import org.apache.spark.sql.execution.SparkPlan
case
cloud-fan commented on code in PR #40522:
URL: https://github.com/apache/spark/pull/40522#discussion_r1144851788
##
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala:
##
@@ -561,34 +562,30 @@ case class AdaptiveSparkPlanExec(
}
cloud-fan opened a new pull request, #40522:
URL: https://github.com/apache/spark/pull/40522
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/39624 .
`QueryStageExec.isMeterialized` should only return true if
HyukjinKwon commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1144849964
##
sql/core/src/main/scala/org/apache/spark/sql/execution/python/MapInPandasExec.scala:
##
@@ -28,7 +28,8 @@ import org.apache.spark.sql.execution.SparkPlan
case
1 - 100 of 143 matches
Mail list logo