date:20230619

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Maciej

Similarly to Jacek, I feel it fails to document an actual community need for such a feature. Currently, any data source implementation has the potential to benefit Spark users across all supported and third-party clients. For generally available sources, this is advantageous for the whole Spar

unsubscribe

2023-06-19 Thread Bharat Kul Ratan

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Dongjoon Hyun

Hi, Herman. This is a series of discussions as I re-summarized here. You can find some context in the previous timeline thread. 2023-05-30 Apache Spark 4.0 Timeframe? https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6 Could you reply there to collect your timeline suggestions? We

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Cheng Pan

This API looks starting from scratch and has no relationship with the existing Java/Scala DataSourceV2 API. Particularly, how can they support SQL? We have been back and forth on the DataSource V2 design since 2.3, I believe there are some things to learn when introducing the Python DataSource A

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Hyukjin Kwon

Actually I support this idea in a way that Python developers don't have to learn Scala to write their own source (and separate packaging). This is more crucial especially when you want to write a simple data source that interacts with the Python ecosystem. On Tue, 20 Jun 2023 at 03:08, Denny Lee

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-19 Thread Jia Fan

+1 Dongjoon Hyun 于2023年6月20日周二 10:41写道： > Please vote on releasing the following candidate as Apache Spark version > 3.4.1. > > The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spa

[VOTE] Release Spark 3.4.1 (RC1)

2023-06-19 Thread Dongjoon Hyun

Please vote on releasing the following candidate as Apache Spark version 3.4.1. The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.4.1 [ ] -1 Do not release this package because ...

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Herman van Hovell

Dongjoon, I am not sure if I am not sure if I follow the line of thought here. Multiple people have asked for clarification on what Spark 4.0 would mean (Holden, Mridul, Jia & Xiao). You can - for the record - also add me to this list. However you choose to single out Xiao because asks this questi

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

2023-06-19 Thread Dongjoon Hyun

Thank you. I reviewed the threads, vote and result once more. I found that I missed the binding vote mark on Holden in the vote result email. The following should be "-0: Holden Karau *". Sorry for this mistake, Holden and all. > -0: Holden Karau To Hyukjin, I disagree with you at the followin

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Denny Lee

Slightly biased, but per my conversations - this would be awesome to have! On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari wrote: > I would definitely use it - is it's available :) > > On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote: > >> Hi Allison and devs, >> >> Although I was against this i

unsubscribe

2023-06-19 Thread Bharat Kul Ratan

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Abdeali Kothari

I would definitely use it - is it's available :) On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote: > Hi Allison and devs, > > Although I was against this idea at first sight (probably because I'm a > Scala dev), I think it could work as long as there are people who'd be > interested in such an

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-19 Thread Jacek Laskowski

Hi Allison and devs, Although I was against this idea at first sight (probably because I'm a Scala dev), I think it could work as long as there are people who'd be interested in such an API. Were there any? I'm just curious. I've seen no emails requesting it. I also doubt that Python devs would l

Re: Data Contracts

2023-06-19 Thread Deepak Sharma

Sorry for using simple in my last email . It’s not gonna to be simple in any terms . Thanks for sharing the git Philip . Will definitely go through it . Thanks Deepak On Mon, 19 Jun 2023 at 3:47 PM, Phillip Henry wrote: > I think it might be a bit more complicated than this (but happy to be > p

Re: Data Contracts

2023-06-19 Thread Phillip Henry

I think it might be a bit more complicated than this (but happy to be proved wrong). I have a minimum working example at: https://github.com/PhillHenry/SparkConstraints.git that runs out-of-the-box (mvn test) and demonstrates what I am trying to achieve. A test persists a DataFrame that conform

Re: Data Contracts

2023-06-19 Thread Deepak Sharma

It can be as simple as adding a function to the spark session builder specifically on the read which can take the yaml file(definition if data co tracts to be in yaml) and apply it to the data frame . It can ignore the rows not matching the data contracts defined in the yaml . Thanks Deepak On M

Re: Data Contracts

2023-06-19 Thread Phillip Henry

For my part, I'm not too concerned about the mechanism used to implement the validation as long as it's rich enough to express the constraints. I took a look at JSON Schemas (for which there are a number of JVM implementations) but I don't think it can handle more complex data types like dates. Ma

Re: [DISCUSS] SPIP: Python Data Source API

unsubscribe

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

Re: [DISCUSS] SPIP: Python Data Source API

Re: [DISCUSS] SPIP: Python Data Source API

Re: [VOTE] Release Spark 3.4.1 (RC1)

[VOTE] Release Spark 3.4.1 (RC1)

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

Re: [VOTE][RESULT] Release Plan for Apache Spark 4.0.0 (June 2024)

Re: [DISCUSS] SPIP: Python Data Source API

unsubscribe

Re: [DISCUSS] SPIP: Python Data Source API

Re: [DISCUSS] SPIP: Python Data Source API

Re: Data Contracts

Re: Data Contracts

Re: Data Contracts

Re: Data Contracts

17 matches

Site Navigation

Mail list logo

Footer information