Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-29 Thread Sean Owen
It looks good except that I'm getting errors running the Spark Connect tests at the end (Java 17, Scala 2.13) It looks like I missed something necessary to build; is anyone getting this? [ERROR] [Error]

Re: [DISCUSS] Incremental statistics collection

2023-08-29 Thread Chetan
Thanks for the detailed explanation. Regards, Chetan On Tue, Aug 29, 2023, 4:50 PM Mich Talebzadeh wrote: > OK, let us take a deeper look here > > ANALYSE TABLE mytable COMPUTE STATISTICS FOR COLUMNS *(c1, c2), c3* > > In above, we are *explicitly grouping columns c1 and c2 together for >

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-29 Thread Martin Grund
+1 (non binding) Tested Spark Connect fully isolated and with PySpark build. Tested as well some of the new PySpark ML Connect features On Tue 29. Aug 2023 at 18:25 Yuanjian Li wrote: > Please vote on releasing the following candidate(RC3) as Apache Spark > version 3.5.0. > > The vote is open

[VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-29 Thread Yuanjian Li
Please vote on releasing the following candidate(RC3) as Apache Spark version 3.5.0. The vote is open until 11:59pm Pacific time Aug 31st and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.5.0 [ ] -1 Do not release this

Re: [DISCUSS] Incremental statistics collection

2023-08-29 Thread Mich Talebzadeh
OK, let us take a deeper look here ANALYSE TABLE mytable COMPUTE STATISTICS FOR COLUMNS *(c1, c2), c3* In above, we are *explicitly grouping columns c1 and c2 together for which we want to compute statistic*s. Additionally, we are also *computing statistics for column c3 independen*t*ly*. This

Re: Spark Connect: API mismatch in SparkSesession#execute

2023-08-29 Thread Stefan Hagedorn
Thank you, Martin! I got it working now using the same shading rules in my project as in Spark. From: Martin Grund Date: Monday, 28. August 2023 at 17:58 To: Stefan Hagedorn Cc: dev@spark.apache.org Subject: Re: Spark Connect: API mismatch in SparkSesession#execute Hi Stefan, There are some

Re: [DISCUSS] Incremental statistics collection

2023-08-29 Thread Chetan
Hi, If we are taking this up, then would ask can we support multicolumn stats such as : ANALYZE TABLE mytable COMPUTE STATISTICS FOR COLUMNS (c1,c2), c3 This should help in estimating better for conditions involving c1 and c2 Thanks. On Tue, 29 Aug 2023 at 09:05, Mich Talebzadeh wrote: >

Re: [DISCUSS] Incremental statistics collection

2023-08-29 Thread Mich Talebzadeh
short answer on top of my head My point was with regard to Cost Based Optimizer (CBO) in traditional databases. The concept of a rowkey in HBase is somewhat similar to that of a primary key in RDBMS. Now in databases with automatic deduplication features (i.e. ignore duplication of rowkey),