Re: Apache Spark 3.3.4 EOL Release?

2023-12-11 Thread Jungtaek Lim
Sorry for the late reply, I've been busy these days and haven't had time to respond. I didn't realize you were doing release preparation and discussion in parallel. I totally agree you should go if you take a step already. Also, thanks for the suggestion! Unfortunately I got to be busy after

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Malcolm Decuire
+1 On Mon, Dec 11, 2023 at 6:21 PM Yang Jie wrote: > +1 > > On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > > +1 > > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > > > +1(non-binding > > > > > > Kent Yao > > > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > > > +1 > > > > > >

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
Where exactly are you getting this information from? As far as I can tell, spark.sql.cbo.enabled has defaulted to false since it was introduced 7 years ago

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Yang Jie
+1 On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > +1 > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > +1(non-binding > > > > Kent Yao > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > +1 > > > > > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > > >> > > >> +1 > > >> >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Dongjoon Hyun
Hi, Mridul. > I am currently on Python 3.11.6, java 8. For the above, I added `Python 3.11 support` at Apache Spark 3.4.0. That's exactly one of my reasons why I wanted to do the EOL release of Apache Spark 3.3.4. https://issues.apache.org/jira/browse/SPARK-41454 (Support Python 3.11) Thanks,

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
You are right. By default CBO is not enabled. Whilst the CBO was the default optimizer in earlier versions of Spark, it has been replaced by the AQE in recent releases. spark.sql.cbo.strategy As I understand, The spark.sql.cbo.strategy configuration property specifies the optimizer strategy used

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Mridul Muralidharan
I am seeing a bunch of python related (43) failures in the sql module (for example [1]) ... I am currently on Python 3.11.6, java 8. Not sure if ubuntu modified anything from under me, thoughts ? I am currently testing this against an older branch to make sure it is not an issue with my desktop.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > spark.sql.cbo.strategy: Set to AUTO to use the CBO as the default optimizer, > or NONE to disable it completely. > Hmm, I’ve also never heard of this setting before and can’t seem to find it in the Spark docs or source code.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > > By default, the CBO is enabled in Spark. Note that this is not correct. AQE is enabled

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
Some of these have been around outside of spark for years. like CBO and RBO etc but I concur that they have a place in spark's doc. Simply put, statistics provide insights into the characteristics of data, such as distribution, skewness, and cardinalities, which help the optimizer make informed