EnricoMi commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1867978567
Next release is a major release, so perfect opportunity to improve API.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
cloud-fan commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1867814446
We can not remove (making private is the same as removal for end users) a
released API. We can update the document and say Spark always ignore the `name`
parameter though.
--
This
EnricoMi commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1867801729
@cloud-fan what do you think about making `Dataset.observe(str, Column,
Column*)` private?
--
This is an automated message from the Apache Git Service.
To respond to the message,
EnricoMi commented on code in PR #43519:
URL: https://github.com/apache/spark/pull/43519#discussion_r1403202213
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##
@@ -1024,6 +1024,27 @@ class DatasetSuite extends QueryTest
assert(namedObservation.get
EnricoMi commented on code in PR #43519:
URL: https://github.com/apache/spark/pull/43519#discussion_r1403196154
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##
@@ -1024,6 +1024,27 @@ class DatasetSuite extends QueryTest
assert(namedObservation.get
cloud-fan commented on code in PR #43519:
URL: https://github.com/apache/spark/pull/43519#discussion_r1403120274
##
sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala:
##
@@ -1024,6 +1024,27 @@ class DatasetSuite extends QueryTest
assert(namedObservation.get
EnricoMi commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1824032786
> Oh sorry I was a bit confused as well. I think it's because of self-join,
we don't require the observation name to be unique.
>
> With the new df_id parameter, seems we can now?
EnricoMi commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1824030046
This whole problem goes away with unnamed `Observation` instances:
```
>>> observation1 = Observation()
>>> observation2 = Observation()
```
What is the purpose of
beliefer commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1822712201
I guess this PR fix a bug that caused by multiple datasets could share with
the same spark session. The listener of `Observation` could receives the
`SparkListenerSQLExecutionEnd`
cloud-fan commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1822188529
Oh sorry I was a bit confused as well. I think it's because of self-join, we
don't require the observation name to be unique.
--
This is an automated message from the Apache Git
EnricoMi commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1821529980
> name is unique but df can be self-joined and observation will be
duplicated.
That sounds like an unrelated edge case. Example in the description is not a
self-join of an
cloud-fan commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1821335854
name is unique but df can be self-joined and observation will be duplicated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
EnricoMi commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1821283381
What is the point of the name of an observation, if that is not the unique
identifier? Create an observation without a name if you cannot come up with a
unique name and `Observation()`
HyukjinKwon closed pull request #43519: [SPARK-45656][SQL] Fix observation when
named observations with the same name on different datasets
URL: https://github.com/apache/spark/pull/43519
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to
HyukjinKwon commented on PR #43519:
URL: https://github.com/apache/spark/pull/43519#issuecomment-1778720002
Merged to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
ueshin opened a new pull request, #43519:
URL: https://github.com/apache/spark/pull/43519
### What changes were proposed in this pull request?
Fixes observation when named observations with the same name on different
datasets.
### Why are the changes needed?
Currently
16 matches
Mail list logo