cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1495954231
thanks, merging to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491886464
If you really worry about regression, we can add a legacy config to fall
back to the old code. I don't agree to make code changes that only fix the
problem in one particular code path,
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491283077
according to the [code in
2.3](https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L190),
I think
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1491280956
> FWIW Both the use cases were working fine in Spark 2.3
Sorry I missed this point. Do you know how it worked in 2.3? Did 2.3 also
call `distinct` before returning the result?
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482756439
I think case 1 works by accident. It's not an intentional design. I don't
think it's a bug that case 2 doesn't work.
--
This is an automated message from the Apache Git Service.
To
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1482326317
> It works because the resolved column has just one match
But there are two id columns. Does Spark already do deduplication somewhere?
--
This is an automated message from the
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480589254
@shrprasa do you know how the case 1 works?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go
cloud-fan commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1480540953
I think column resolution should only look at one level, to make the
behavior simple and predictable. I tried it on pgsql and it fails as well:
```
create table t(i int);