Similarly to Jacek, I feel it fails to document an actual community need
for such a feature.
Currently, any data source implementation has the potential to benefit
Spark users across all supported and third-party clients. For generally
available sources, this is advantageous for the whole Spar
Hi, Herman.
This is a series of discussions as I re-summarized here.
You can find some context in the previous timeline thread.
2023-05-30 Apache Spark 4.0 Timeframe?
https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
Could you reply there to collect your timeline suggestions? We
This API looks starting from scratch and has no relationship with the existing
Java/Scala DataSourceV2 API. Particularly, how can they support SQL?
We have been back and forth on the DataSource V2 design since 2.3, I believe
there are some things to learn when introducing the Python DataSource A
Actually I support this idea in a way that Python developers don't have to
learn Scala to write their own source (and separate packaging).
This is more crucial especially when you want to write a simple data source
that interacts with the Python ecosystem.
On Tue, 20 Jun 2023 at 03:08, Denny Lee
+1
Dongjoon Hyun 于2023年6月20日周二 10:41写道:
> Please vote on releasing the following candidate as Apache Spark version
> 3.4.1.
>
> The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spa
Please vote on releasing the following candidate as Apache Spark version
3.4.1.
The vote is open until June 23rd 1AM (PST) and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 3.4.1
[ ] -1 Do not release this package because ...
Dongjoon, I am not sure if I am not sure if I follow the line of thought
here.
Multiple people have asked for clarification on what Spark 4.0 would mean
(Holden, Mridul, Jia & Xiao). You can - for the record - also add me to
this list. However you choose to single out Xiao because asks this questi
Thank you. I reviewed the threads, vote and result once more.
I found that I missed the binding vote mark on Holden in the vote result email.
The following should be "-0: Holden Karau *". Sorry for this mistake, Holden
and all.
> -0: Holden Karau
To Hyukjin, I disagree with you at the followin
Slightly biased, but per my conversations - this would be awesome to have!
On Mon, Jun 19, 2023 at 09:43 Abdeali Kothari
wrote:
> I would definitely use it - is it's available :)
>
> On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote:
>
>> Hi Allison and devs,
>>
>> Although I was against this i
I would definitely use it - is it's available :)
On Mon, 19 Jun 2023, 21:56 Jacek Laskowski, wrote:
> Hi Allison and devs,
>
> Although I was against this idea at first sight (probably because I'm a
> Scala dev), I think it could work as long as there are people who'd be
> interested in such an
Hi Allison and devs,
Although I was against this idea at first sight (probably because I'm a
Scala dev), I think it could work as long as there are people who'd be
interested in such an API. Were there any? I'm just curious. I've seen no
emails requesting it.
I also doubt that Python devs would l
Sorry for using simple in my last email .
It’s not gonna to be simple in any terms .
Thanks for sharing the git Philip .
Will definitely go through it .
Thanks
Deepak
On Mon, 19 Jun 2023 at 3:47 PM, Phillip Henry
wrote:
> I think it might be a bit more complicated than this (but happy to be
> p
I think it might be a bit more complicated than this (but happy to be
proved wrong).
I have a minimum working example at:
https://github.com/PhillHenry/SparkConstraints.git
that runs out-of-the-box (mvn test) and demonstrates what I am trying to
achieve.
A test persists a DataFrame that conform
It can be as simple as adding a function to the spark session builder
specifically on the read which can take the yaml file(definition if data
co tracts to be in yaml) and apply it to the data frame .
It can ignore the rows not matching the data contracts defined in the yaml .
Thanks
Deepak
On M
For my part, I'm not too concerned about the mechanism used to implement
the validation as long as it's rich enough to express the constraints.
I took a look at JSON Schemas (for which there are a number of JVM
implementations) but I don't think it can handle more complex data types
like dates. Ma
17 matches
Mail list logo