Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for proposing and voting for the feature Mich. adding some references to the thread. - Jira ticket - SPARK-24815 - Design Doc

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
+1 for me (non binding) *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Mridul Muralidharan
Hi, We are internally exploring adding support for dynamically changing the resource profile of a stage based on runtime characteristics. This includes failures due to OOM and the like, slowness due to excessive GC, resource wastage due to excessive overprovisioning, etc. Essentially handles

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Tom Graves
It is interesting. I think there are definitely some discussion points around this.  reliability vs performance is always a trade off and its great it doesn't fail but if it doesn't meet someone's SLA now that could be as bad if its hard to figure out why.   I think if something like this

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for the +1, I will propose voting in a new thread now. - Pavan On Wed, Jan 17, 2024 at 5:28 PM Mich Talebzadeh wrote: > I think we have discussed this enough and I consider it as a useful > feature.. I propose a vote on it. > > + 1 for me > > Mich Talebzadeh, > Dad | Technologist |

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
I think we have discussed this enough and I consider it as a useful feature.. I propose a vote on it. + 1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau
Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue,

[Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread kalyan
Hello All, At Uber, we had recently, done some work on improving the reliability of spark applications in scenarios of fatter executors going out of memory and leading to application failure. Fatter executors are those that have more than 1 task running on it at a given time concurrently. This

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-16 Thread Adam Hobbs
Hi, This is my first time using the dev mailing list so I hope this is the correct way to do it. I would like to lend my support to this proposal and offer my experiences as a consumer of spark, and specifically Spark Structured Streaming (SSS). I am more of an cloud infrastructure devops

[VOTE][RESULT] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-11 Thread Jungtaek Lim
The vote passes with 12 +1s (3 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Anish Shrigondekar - Mich Talebzadeh - Raghu Angadi - 刘唯 - Shixiong Zhu (*) - Bartosz Konieczny - Praveen Gattu - Burak Yavuz - Bhuwan Sahni - L. C. Hsieh (*) -

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-11 Thread Jungtaek Lim
Thanks all for participating! The vote passed. I'll send out the result to a separate thread. On Thu, Jan 11, 2024 at 10:37 PM Wenchen Fan wrote: > +1 > > On Thu, Jan 11, 2024 at 9:32 AM L. C. Hsieh wrote: > >> +1 >> >> On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni >> wrote: >> >>> +1. This is

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Wenchen Fan
+1 On Thu, Jan 11, 2024 at 9:32 AM L. C. Hsieh wrote: > +1 > > On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni > wrote: > >> +1. This is a good addition. >> >> >> *Bhuwan Sahni* >> Staff Software Engineer >> >> bhuwan.sa...@databricks.com >> 500 108th Ave. NE >>

Spark Kafka Rack Aware Consumer

2024-01-10 Thread Schwager, Randall
Hello Spark Devs! Has there been discussion around adding the ability to dynamically set the ‘client.rack’ Kafka parameter at the executor? The Kafka SQL connector code on master doesn’t seem to support this feature. One can easily set the ‘client.rack’ parameter at the driver, but that just

Install Ruby 3 to build the docs

2024-01-10 Thread Nicholas Chammas
Just a quick heads up that, while Ruby 2.7 will continue to work, you should plan to install Ruby 3 in the near future in order to build the docs. (I recommend using rbenv to manage multiple Ruby versions.) Ruby 2 reached EOL in March 2023

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread L. C. Hsieh
+1 On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni wrote: > +1. This is a good addition. > > > *Bhuwan Sahni* > Staff Software Engineer > > bhuwan.sa...@databricks.com > 500 108th Ave. NE > Bellevue, WA 98004 > USA > > > On Wed, Jan 10, 2024 at 9:00 AM Burak Yavuz

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Bhuwan Sahni
+1. This is a good addition. *Bhuwan Sahni* Staff Software Engineer bhuwan.sa...@databricks.com 500 108th Ave. NE Bellevue, WA 98004 USA On Wed, Jan 10, 2024 at 9:00 AM Burak Yavuz wrote: > +1. Excited to see more stateful workloads with Structured Streaming! > >

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Burak Yavuz
+1. Excited to see more stateful workloads with Structured Streaming! Best, Burak On Wed, Jan 10, 2024 at 8:21 AM Praveen Gattu wrote: > +1. This brings Structured Streaming a good solution for customers wanting > to build stateful stream processing applications. > > On Wed, Jan 10, 2024 at

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Praveen Gattu
+1. This brings Structured Streaming a good solution for customers wanting to build stateful stream processing applications. On Wed, Jan 10, 2024 at 7:30 AM Bartosz Konieczny wrote: > +1 :) > > On Wed, Jan 10, 2024 at 9:57 AM Shixiong Zhu wrote: > >> +1 (binding) >> >> Best Regards, >>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Bartosz Konieczny
+1 :) On Wed, Jan 10, 2024 at 9:57 AM Shixiong Zhu wrote: > +1 (binding) > > Best Regards, > Shixiong Zhu > > > On Tue, Jan 9, 2024 at 6:47 PM 刘唯 wrote: > >> This is a good addition! +1 >> >> Raghu Angadi 于2024年1月9日周二 13:17写道: >> >>> +1. This is a major improvement to the state API. >>> >>>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Shixiong Zhu
+1 (binding) Best Regards, Shixiong Zhu On Tue, Jan 9, 2024 at 6:47 PM 刘唯 wrote: > This is a good addition! +1 > > Raghu Angadi 于2024年1月9日周二 13:17写道: > >> +1. This is a major improvement to the state API. >> >> Raghu. >> >> On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh >> wrote: >> >>> +1

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Jungtaek Lim
Friendly reminder, VOTE thread is now live! https://lists.apache.org/thread/16ryx828bwoth31hobknxnjfxjxj07mf The vote made here is not counted toward, so please ensure you vote in the VOTE thread. Thanks! On Tue, Jan 9, 2024 at 9:33 AM Jungtaek Lim wrote: > Thanks everyone for the feedback! > >

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread Mich Talebzadeh
Hi Ashok, Thanks for pointing out the databricks article Scalable Spark Structured Streaming for REST API Destinations | Databricks Blog I browsed it and it is basically similar to many of us involved

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
This is a good addition! +1 Raghu Angadi 于2024年1月9日周二 13:17写道: > +1. This is a major improvement to the state API. > > Raghu. > > On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh > wrote: > >> +1 for me as well >> >> >> Mich Talebzadeh, >> Dad | Technologist | Solutions Architect | Engineer >>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Raghu Angadi
+1. This is a major improvement to the state API. Raghu. On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh wrote: > +1 for me as well > > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > >view my Linkedin profile >

RE: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
+1 This is a good addition! On 2024/01/09 03:23:35 Anish Shrigondekar wrote: > Thanks Jungtaek for creating the Vote thread. > > +1 (non-binding) from my side too. > > Thanks, > Anish > > On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim > wrote: > > > Starting with my +1 (non-binding). Thanks! > > >

Re: AutoReply: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
Hi, Please stop this acknowledgement email. It is spamming the forum unnecessarily! Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
+1 for me as well Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Anish Shrigondekar
Thanks Jungtaek for creating the Vote thread. +1 (non-binding) from my side too. Thanks, Anish On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim wrote: > Starting with my +1 (non-binding). Thanks! > > On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim > wrote: > >> Hi all, >> >> I'd like to start the

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks! On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Streaming - Arbitrary > State API v2. > > References: > >- JIRA ticket >- SPIP

[VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Hi all, I'd like to start the vote for SPIP: Structured Streaming - Arbitrary State API v2. References: - JIRA ticket - SPIP doc -

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Thanks everyone for the feedback! Given that we get positive feedback without major concerns, I will initiate the vote thread soon. Please make a vote in that thread as well. Thanks again! On Tue, Jan 9, 2024 at 7:44 AM Bhuwan Sahni wrote: > +1 on the newer APIs. I believe these APIs provide

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Please also note that Flask, by default, is a single-threaded web framework. While it is suitable for development and small-scale applications, it may not handle concurrent requests efficiently in a production environment. In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI (

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Bhuwan Sahni
+1 on the newer APIs. I believe these APIs provide a much powerful mechanism for the user to perform arbitrary state management in Structured Streaming queries. Thanks Bhuwan Sahni On Mon, Jan 8, 2024 at 10:07 AM L. C. Hsieh wrote: > +1 > > I left some comments in the SPIP doc and got replies

Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Thought it might be useful to share my idea with fellow forum members. During the breaks, I worked on the *seamless integration of Spark Structured Streaming with Flask REST API for real-time data ingestion and analytics*. The use case revolves around a scenario where data is generated through

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread L. C. Hsieh
+1 I left some comments in the SPIP doc and got replies quickly. The new API looks good and more comprehensive. I think it will help Spark Structured Streaming to be more useful in more complicated streaming use cases. On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz wrote: > > I'm also a +1 on the

Re: Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-08 Thread Sean Owen
Agreed, that looks wrong. From the code, it seems that "timezone" is only used for testing, though apparently no test caught this. I'll submit a PR to patch it in any event: https://github.com/apache/spark/pull/44619 On Mon, Jan 8, 2024 at 1:33 AM Janda Martin wrote: > I think that >

Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-07 Thread Janda Martin
I think that [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter introduced regression in UIUtils::formatBatchTime when timezone is defined. DateTimeFormatter is thread-safe and immutable according to JavaDoc so method DateTimeFormatter::withZone

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Burak Yavuz
I'm also a +1 on the newer APIs. We had a lot of learnings from using flatMapGroupsWithState and I believe that we can make the APIs a lot easier to use. On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Mich Talebzadeh
Hi Pavan, Thanks for your answers. Given these responses , it seems like you have already taken a comprehensive approach to address the challenges associated with dynamic scaling in Spark Structured Streaming. IMO, It would also be beneficial to engage with other members as well, or gather

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Shixiong Zhu
+1. Looking forward to seeing how the new API brings in new streaming use cases! Best Regards, Shixiong Zhu On Wed, Nov 29, 2023 at 6:42 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the thread > once again to see if other folks have

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Pavan Kotikalapudi
Hi Mich, As always thanks for looking keenly on the design, really appreciate your inputs on this Ticket. Would love to improve this further and cover more edge-cases if any. I can answer the concerns you have below. I believe I have covered some of them in the proposal, If at all I missed out

回复:unsubscribe

2024-01-04 Thread yxj1141

unsubscribe

2024-01-03 Thread Chenyang Tang
unsubscribe

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-02 Thread Mich Talebzadeh
Hi Pavan, Thanks for putting this request forward. I am generally supportive of it. In a nutshell, I believe this proposal can potentially hold a significant promise for optimizing resource utilization and enhancing performance in Spark Structured Streaming. Having said that there are potential

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-01 Thread Pavan Kotikalapudi
Hi PMC members, Bumping this idea for one last time to see if there are any approvals to take it forward. Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc:

Re: When and how does Spark use metastore statistics?

2023-12-26 Thread Bjørn Jørgensen
Tell me more about spark.sql.cbo.strategy tir. 12. des. 2023 kl. 00:25 skrev Nicholas Chammas < nicholas.cham...@gmail.com>: > Where exactly are you getting this information from? > > As far as I can tell, spark.sql.cbo.enabled has defaulted to false since > it was introduced 7 years ago >

Re: Contribute to Spark Open source

2023-12-25 Thread Colin Williams
Hello, Did you see https://spark.apache.org/contributing.html ? On Mon, Dec 25, 2023 at 5:13 AM Sudharshan V wrote: > > Hi All, > > I am new to Open source and have been using spark scala in my organisation > for the past couple of years. > I would like to contribute to spark open source. > I

Contribute to Spark Open source

2023-12-25 Thread Sudharshan V
Hi All, I am new to Open source and have been using spark scala in my organisation for the past couple of years. I would like to contribute to spark open source. I am not exactly sure of how and where to start. Any help would be greatly appreciated. Is there any documentation per se on how to

the life cycle shuffle Dependency

2023-12-24 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation to the user list and BCC-ing the dev list. Also, this statement > We are not validating against table or column existence. is not correct. When you call spark.sql(…), Spark will lookup the table references and

Re: Validate spark sql

2023-12-24 Thread Mich Talebzadeh
Yes, you can validate the syntax of your PySpark SQL queries without connecting to an actual dataset or running the queries on a cluster. PySpark provides a method for syntax validation without executing the query. Something like below __ / __/__ ___ _/ /__ _\ \/ _

Validate spark sql

2023-12-23 Thread ram manickam
Hello, Is there a way to validate pyspark sql to validate only syntax errors?. I cannot connect do actual data set to perform this validation. Any help would be appreciated. Thanks Ram

Meet our keynote speakers and register to Community Over Code EU!

2023-12-22 Thread Ryan Skraba
[Note: You're receiving this email because you are subscribed to one or more project dev@ mailing lists at the Apache Software Foundation.] * Merge with the ASF EUniverse!The registration for

Unsubscribe

2023-12-21 Thread yxj1141
Unsubscribe

Re: ShuffleManager and Speculative Execution

2023-12-21 Thread Mich Talebzadeh
Interesting point. As I understand, the key point is the ShuffleManager ensures that only one map output file is processed by the reduce task, even when multiple attempts succeed. So it is not a random selection process. At the reduce stage, only one copy of the map output needs to be read by the

ShuffleManager and Speculative Execution

2023-12-21 Thread Enrico Minack
Hi Spark devs, I have a question around ShuffleManager: With speculative execution, one map output file is being created multiple times (by multiple task attempts). If both attempts succeed, which is to be read by the reduce task in the next stage? Is any map output as good as any other?

the life cycle shuffle Dependency

2023-12-17 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Guidance for filling out "Affects Version" on Jira

2023-12-17 Thread Nicholas Chammas
The Contributing guide only mentions what to fill in for “Affects Version” for bugs. How about for improvements? This question once caused some problems when I set “Affects Version” to the last released version, and that was interpreted as a request

[ANNOUNCE] Apache Spark 3.3.4 released

2023-12-16 Thread Dongjoon Hyun
We are happy to announce the availability of Apache Spark 3.3.4! Spark 3.3.4 is the last maintenance release based on the branch-3.3 maintenance branch of Spark. It contains many fixes including security and correctness domains. We strongly recommend all 3.3 users to upgrade to this or higher

[VOTE][RESULT] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
The vote passes with 6 +1s (3 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun * - Yuming Wang * - Kent Yao - Liang-Chi Hsieh * - Yang Jie - Malcolm Decuire +0: None -1: None

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
Thank you all. This vote passed. Let me conclude. Dongjoon On 2023/12/11 23:58:28 Malcolm Decuire wrote: > +1 > > On Mon, Dec 11, 2023 at 6:21 PM Yang Jie wrote: > > > +1 > > > > On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > > > +1 > > > > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote:

Re: Spark 3.5.0 and issue SPARK-45593 (SPARK-45201)

2023-12-14 Thread Steven B Jones
A follow-up to my note yesterday. Issue SPARK-45201 has similar externals to SPARK-45593 and is written to cover target release 3.5.0. Remarkably, the issue only affects self-created distributions, and not the one(s) provided by Spark development itself. I'll let you read

Spark 3.5.0 and issue SPARK-45593

2023-12-13 Thread Steven B Jones
Hello, I maintain a version of Apache Spark that runs on z/OS. I'm porting Spark 3.5.0 to our platform, and having the problem described by https://issues.apache.org/jira/projects/SPARK/issues/SPARK-45593 in

Re: Apache Spark 3.3.4 EOL Release?

2023-12-11 Thread Jungtaek Lim
Sorry for the late reply, I've been busy these days and haven't had time to respond. I didn't realize you were doing release preparation and discussion in parallel. I totally agree you should go if you take a step already. Also, thanks for the suggestion! Unfortunately I got to be busy after

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Malcolm Decuire
+1 On Mon, Dec 11, 2023 at 6:21 PM Yang Jie wrote: > +1 > > On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > > +1 > > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > > > +1(non-binding > > > > > > Kent Yao > > > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > > > +1 > > > > > >

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
Where exactly are you getting this information from? As far as I can tell, spark.sql.cbo.enabled has defaulted to false since it was introduced 7 years ago

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Yang Jie
+1 On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > +1 > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > > > +1(non-binding > > > > Kent Yao > > > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > > > +1 > > > > > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > > >> > > >> +1 > > >> >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Dongjoon Hyun
Hi, Mridul. > I am currently on Python 3.11.6, java 8. For the above, I added `Python 3.11 support` at Apache Spark 3.4.0. That's exactly one of my reasons why I wanted to do the EOL release of Apache Spark 3.3.4. https://issues.apache.org/jira/browse/SPARK-41454 (Support Python 3.11) Thanks,

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
You are right. By default CBO is not enabled. Whilst the CBO was the default optimizer in earlier versions of Spark, it has been replaced by the AQE in recent releases. spark.sql.cbo.strategy As I understand, The spark.sql.cbo.strategy configuration property specifies the optimizer strategy used

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-11 Thread Mridul Muralidharan
I am seeing a bunch of python related (43) failures in the sql module (for example [1]) ... I am currently on Python 3.11.6, java 8. Not sure if ubuntu modified anything from under me, thoughts ? I am currently testing this against an older branch to make sure it is not an issue with my desktop.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > spark.sql.cbo.strategy: Set to AUTO to use the CBO as the default optimizer, > or NONE to disable it completely. > Hmm, I’ve also never heard of this setting before and can’t seem to find it in the Spark docs or source code.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Nicholas Chammas
> On Dec 11, 2023, at 6:40 AM, Mich Talebzadeh > wrote: > > By default, the CBO is enabled in Spark. Note that this is not correct. AQE is enabled

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
Some of these have been around outside of spark for years. like CBO and RBO etc but I concur that they have a place in spark's doc. Simply put, statistics provide insights into the characteristics of data, such as distribution, skewness, and cardinalities, which help the optimizer make informed

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread L. C. Hsieh
+1 On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote: > > +1(non-binding > > Kent Yao > > Yuming Wang 于2023年12月11日周一 09:33写道: > > > > +1 > > > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > >> > >> +1 > >> > >> Dongjoon > >> > >> On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > >> > Please

Re: Algolia search on website is broken

2023-12-10 Thread Gengliang Wang
Hi Nick, Thank you for reporting the issue with our web crawler. I've found that the issue was due to a change(specifically, pull request #40269 ) in the website's HTML structure, where the JavaScript selector ".container-wrapper" is now ".container".

Re: When and how does Spark use metastore statistics?

2023-12-10 Thread Nicholas Chammas
I’ve done some reading and have a slightly better understanding of statistics now. Every implementation of LeafNode.computeStats

Disabling distributing local conf file during spark-submit

2023-12-10 Thread Eugene Miretsky
Hello, It looks like local conf archives always get copied to the target (HDFS) every time a job is submitted 1. Other

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Kent Yao
+1(non-binding Kent Yao Yuming Wang 于2023年12月11日周一 09:33写道: > > +1 > > On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: >> >> +1 >> >> Dongjoon >> >> On 2023/12/08 21:41:00 Dongjoon Hyun wrote: >> > Please vote on releasing the following candidate as Apache Spark version >> > 3.3.4. >> >

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Yuming Wang
+1 On Mon, Dec 11, 2023 at 5:55 AM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > > 3.3.4. > > > > The vote is open until December 15th 1AM (PST) and passes if a majority > +1

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-10 Thread Dongjoon Hyun
+1 Dongjoon On 2023/12/08 21:41:00 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.3.4. > > The vote is open until December 15th 1AM (PST) and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this

Re: Spark on Yarn with Java 17

2023-12-10 Thread Jason Xu
Doogjoon and Luca, it's great to learn that there is a way to run different JVM versions for Spark and Hadoop binaries. I had concerns about Java compatibility issues without this solution. Thank you! Luca, thank you for providing a how-to guide for this. It's really helpful! On Sat, Dec 9, 2023

Re: Algolia search on website is broken

2023-12-10 Thread Nicholas Chammas
Pinging Gengliang and Xiao about this, per these docs . It looks like to fix this problem you need access to the Algolia Crawler Admin Console.

unsubscribe

2023-12-10 Thread bruce COTTMAN
- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

unsubscribe

2023-12-10 Thread Stevens, Clay
Clay

unsubscribe

2023-12-10 Thread Rajanikant V

unsubscribe

2023-12-09 Thread Ravi Chinoy
-- Regards Ravi Chinoy Phone: (415) 230 9971

RE: Spark on Yarn with Java 17

2023-12-09 Thread Luca Canali
Jason, In case you need a pointer on how to run Spark with a version of Java different than the version used by the Hadoop processes, as indicated by Dongjoon, this is an example of what we do on our Hadoop clusters:

Re: Spark on Yarn with Java 17

2023-12-09 Thread Dongjoon Hyun
Please try Apache Spark 3.3+ (SPARK-33772) with Java 17 on your cluster simply, Jason. I believe you can set up for your Spark 3.3+ jobs to run with Java 17 while your cluster(DataNode/NameNode/ResourceManager/NodeManager) is still sitting on Java 8. Dongjoon. On Fri, Dec 8, 2023 at 11:12 PM

Re: Spark on Yarn with Java 17

2023-12-08 Thread Jason Xu
Dongjoon, thank you for the fast response! Apache Spark 4.0.0 depends on only Apache Hadoop client library. To better understand your answer, does that mean a Spark application built with Java 17 can successfully run on a Hadoop cluster on version 3.3 and Java 8 runtime? On Fri, Dec 8, 2023 at

Re: Spark on Yarn with Java 17

2023-12-08 Thread Dongjoon Hyun
Hi, Jason. Apache Spark 4.0.0 depends on only Apache Hadoop client library. You can track all `Apache Spark 4` activities including Hadoop dependency here. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) According to the release history, the original suggested

Spark on Yarn with Java 17

2023-12-08 Thread Jason Xu
Hi Spark devs, According to the Spark 3.5 release notes, Spark 4 will no longer support Java 8 and 11 (link ). My company is using Spark on Yarn with Java 8 now. When considering a future upgrade to Spark 4, one issue

[VOTE] Release Spark 3.3.4 (RC1)

2023-12-08 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.3.4. The vote is open until December 15th 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.3.4 [ ] -1 Do not release this package

Re: Apache Spark 3.3.4 EOL Release?

2023-12-08 Thread Dongjoon Hyun
Thank you, Mridul, and Kent, too. Additionally, thank you for volunteering as a release manager, Jungtaek, For the 3.3.4 EOL release, I've already been testing and preparing for one week since my first email. So, why don't you proceed with the Apache Spark 3.5.1 release? It has 142 patches

Re: Apache Spark 3.3.4 EOL Release?

2023-12-07 Thread Jungtaek Lim
+1 to release 3.3.4 and consider 3.3 as EOL. Btw, it'd be probably ideal if we could encourage taking an opportunity of experiencing the release process to people who hadn't had a time to go through (when there are people who are happy to take it). If you don't mind and we are not very strict on

Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Nicholas Chammas
This is not a question for the dev list. Moving dev to bcc. One thing I would try is to connect to this database using JDBC + SSH tunnel, but without Spark. That way you can focus on getting the JDBC connection to work without Spark complicating the picture for you. > On Dec 5, 2023, at 8:12 

SSH Tunneling issue with Apache Spark

2023-12-05 Thread Venkatesan Muniappan
Hi Team, I am facing an issue with SSH Tunneling in Apache Spark. The behavior is same as the one in this Stackoverflow question but there are no answers there. This is what I am trying:

When and how does Spark use metastore statistics?

2023-12-05 Thread Nicholas Chammas
I’m interested in improving some of the documentation relating to the table and column statistics that get stored in the metastore, and how Spark uses them. But I’m not clear on a few things, so I’m writing to you with some questions. 1. The documentation for 

Algolia search on website is broken

2023-12-05 Thread Nicholas Chammas
Should I report this instead on Jira? Apologies if the dev list is not the right place. Search on the website appears to be broken. For example, here is a search for “analyze”:  And here is the same search using DDG

unsubscribe

2023-12-05 Thread Kalpana Jalawadi

Re: Apache Spark 3.3.4 EOL Release?

2023-12-04 Thread Kent Yao
+1 Thank you for driving this EOL release, Dongjoon! Kent Yao On 2023/12/04 19:40:10 Mridul Muralidharan wrote: > +1 > > Regards, > Mridul > > On Mon, Dec 4, 2023 at 11:40 AM L. C. Hsieh wrote: > > > +1 > > > > Thanks Dongjoon! > > > > On Mon, Dec 4, 2023 at 9:26 AM Yang Jie wrote: > > > >

<    4   5   6   7   8   9   10   11   12   13   >