[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-08 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178695254 > BTW, with CBO off, where do we use row count? we use it in places like : https://github.com/apache/spark/blob/161c596cafea9c235b5c918d8999c085401d73a9/sql/catalyst/src/main/sca

[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-08 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1178646200 > After this PR, what's the difference between SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor BasicStatsPlanVisitor additionally takes has columnStats such as (NDV /

[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-05 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1175798196 Thanks @wangyum ! > So enabling spark.sql.cbo.enabled is what you want? I believe then setting `spark.sql.cbo.enabled` to true by default could help, (what i wanted was to

[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-05 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1174945271 rebased and regenerated the golden files via : * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly *PlanStability*Suite" * SPARK_GENERATE_GOLDEN_FILES=1 SPARK_ANSI_SQL_

[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-05 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1174817038 > Could you enable spark.sql.cbo.enabled to estimate row count? Thanks @wangyum, I am aware of the alternate visitor we use with cbo. I raised this pr considering : 1.

[GitHub] [spark] singhpk234 commented on pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-05 Thread GitBox
singhpk234 commented on PR #37083: URL: https://github.com/apache/spark/pull/37083#issuecomment-1174769959 cc @huaxingao @cloud-fan @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s