[GitHub] [spark] cloud-fan commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-10 Thread GitBox
cloud-fan commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1179974621 cc @wzhfy @c21 can you take a look first? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-08 Thread GitBox
cloud-fan commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178723347 OK I think the idea makes sense. With CBO off, the optimizer/planner only needs size in bytes, but row count is also an important statistics to estimate size in bytes, and should be pro

[GitHub] [spark] cloud-fan commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-08 Thread GitBox
cloud-fan commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178656294 Maybe we should name them `BasicStatesPlanVisitor` and `BasicAndColumnStatsPlanVisitor`. We also need to make sure the updated `SizeInBytesOnlyStatsPlanVisitor` can propagate row count

[GitHub] [spark] cloud-fan commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-07 Thread GitBox
cloud-fan commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178611319 I'm a bit confused. After this PR, what's the difference between `SizeInBytesOnlyStatsPlanVisitor` and `BasicStatsPlanVisitor`? -- This is an automated message from the Apache Git Ser