Re: Spark Merge On Read Support

2021-11-18 Thread Yufei Gu
gt;> >>>>>> What’s been done so far is pretty significant: >>>>>> >>>>>>- Add new writers that can handle deletes across multiple >>>>>>partition specs >>>>>>- Add Spark 3.2 module and refactor Spark

Re: Spark Merge On Read Support

2021-11-18 Thread Puneet Zaroo
t;>>>- Add metadata columns to Spark 3.2 >>>>>- Add support for required distribution and ordering in Spark 3.2 >>>>>- Support Spark 3.2 dynamic filtering >>>>> >>>>> Many of those are the building blocks for the

Re: Spark Merge On Read Support

2021-11-18 Thread Ryan Blue
And >>>> it’s really amazing to finally have support for some major improvements: >>>> dynamic filtering on all queries, metadata columns, and required >>>> distribution and ordering! >>>> >>>> Ryan >>>> >>>> On Thu

Re: Spark Merge On Read Support

2021-11-17 Thread Puneet Zaroo
e major improvements: >>> dynamic filtering on all queries, metadata columns, and required >>> distribution and ordering! >>> >>> Ryan >>> >>> On Thu, Nov 11, 2021 at 11:46 PM Sreeram Garlapati < >>> gsreeramku...@gmail.com> wro

Re: Spark Merge On Read Support

2021-11-17 Thread Ryan Blue
t; >> On Thu, Nov 11, 2021 at 11:46 PM Sreeram Garlapati < >> gsreeramku...@gmail.com> wrote: >> >>> Hello Iceberg devs! >>> >>> After going through the mail threads (especially "Spark version support >>> strategy") and relev

Re: Spark Merge On Read Support

2021-11-16 Thread Sreeram Garlapati
Nov 11, 2021 at 11:46 PM Sreeram Garlapati < > gsreeramku...@gmail.com> wrote: > >> Hello Iceberg devs! >> >> After going through the mail threads (especially "Spark version support >> strategy") and relevant PRs - it looks like - *Merge on Read* Support >&

Re: Spark Merge On Read Support

2021-11-15 Thread Ryan Blue
PM Sreeram Garlapati wrote: > Hello Iceberg devs! > > After going through the mail threads (especially "Spark version support > strategy") and relevant PRs - it looks like - *Merge on Read* Support > (ie., Spark writers writing equality deletes) will be available with

Spark Merge On Read Support

2021-11-11 Thread Sreeram Garlapati
Hello Iceberg devs! After going through the mail threads (especially "Spark version support strategy") and relevant PRs - it looks like - *Merge on Read* Support (ie., Spark writers writing equality deletes) will be available with *Iceberg **+ Spark 3.2*. Is this understandi

Re: Spark Merge-on-Read Feature

2021-09-24 Thread Jack Ye
Jack Ye > > On Fri, Sep 24, 2021 at 10:23 AM Aman Rawat > wrote: > >> Hello devs, >> >> We are trying to implement Spark support for the merge-on-read feature in >> Iceberg. Can you please share the elaborate plan here, ongoing work and >> tentative timelines

Re: Spark Merge-on-Read Feature

2021-09-24 Thread Jack Ye
project https://github.com/apache/iceberg/projects/10. Best, Jack Ye On Fri, Sep 24, 2021 at 10:23 AM Aman Rawat wrote: > Hello devs, > > We are trying to implement Spark support for the merge-on-read feature in > Iceberg. Can you please share the elaborate plan here,

Spark Merge-on-Read Feature

2021-09-24 Thread Aman Rawat
Hello devs, We are trying to implement Spark support for the merge-on-read feature in Iceberg. Can you please share the elaborate plan here, ongoing work and tentative timelines for the same (both from spark and iceberg repos side). We are following the priority board that has been set up

Re: What have I learned from doing Merge-On-Read PoC

2020-03-23 Thread OpenInx
merge: It uses filter API and also need >merge sort optimization. > > FYI, there is also an issue > <https://github.com/apache/incubator-iceberg/issues/825> about the > addtional meta column, it seems like spark will handle the additional > columns for iceberg so I d

What have I learned from doing Merge-On-Read PoC

2020-03-21 Thread OpenInx
Dear Iceberg Dev: As I said in the document[1] before, we think the iceberg update/delete features (mainly merge-on-read) is the high priority feature (we've also discussed some flink+iceberg scenarios and anybody who interest that part can read the document). Recently, I write some demo

Re: merge-on-read?

2018-12-07 Thread Erik Wright
ng a replace operator where file2’s > version of a column replaces file1’s version. > > .. Owen > > > On Nov 28, 2018, at 9:44 AM, Ryan Blue > wrote: > > > > What do you mean by merge on read? > > > > A few people I've talked to are interested in

Re: merge-on-read?

2018-11-30 Thread Ryan Blue
question. Implemented properly, do you see any > reason that a series of PRs to implement merge-on-read support wouldn't be > welcomed? > > Thanks, > > Erik > > On Wed., Nov. 28, 2018, 5:25 p.m. Erik Wright wrote: > > > > > > > On Wed, Nov 28, 2018 at 4:32

Re: merge-on-read?

2018-11-28 Thread Owen O'Malley
; > > > It would look like: > > > > file1.orc: struct file2.orc: > > struct > > > > It would let them leave the stable information and only re-write the > > second column family when the information in the mutable column family > > changes. It would a

Re: merge-on-read?

2018-11-28 Thread Owen O'Malley
after the data has been ingested. From there it is easy to imagine having a replace operator where file2’s version of a column replaces file1’s version. .. Owen > On Nov 28, 2018, at 9:44 AM, Ryan Blue wrote: > > What do you mean by merge on read? > > A few peo

Re: merge-on-read?

2018-11-28 Thread Ryan Blue
What do you mean by merge on read? A few people I've talked to are interested in building delete and upsert features. Those would create files that track the changes, which would be merged at read time to apply them. Is that what you mean? rb On Tue, Nov 27, 2018 at 12:26 PM Erik Wright wrote