[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-04-04 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1495954231 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-31 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1491886464 If you really worry about regression, we can add a legacy config to fall back to the old code. I don't agree to make code changes that only fix the problem in one particular code path,

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-30 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1491283077 according to the [code in 2.3](https://github.com/apache/spark/blob/branch-2.3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L190), I think

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-30 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1491280956 > FWIW Both the use cases were working fine in Spark 2.3 Sorry I missed this point. Do you know how it worked in 2.3? Did 2.3 also call `distinct` before returning the result?

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482756439 I think case 1 works by accident. It's not an intentional design. I don't think it's a bug that case 2 doesn't work. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482326317 > It works because the resolved column has just one match But there are two id columns. Does Spark already do deduplication somewhere? -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-22 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1480589254 @shrprasa do you know how the case 1 works? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-22 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1480540953 I think column resolution should only look at one level, to make the behavior simple and predictable. I tried it on pgsql and it fails as well: ``` create table t(i int);