Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Saisai Shao
>From Dev's point, it has less burden to always support the latest version of Spark (for example). But from user's point, especially for us who maintain Spark internally, it is not easy to upgrade the Spark version for the first time (since we have many customizations internally), and we're still p

Re: Iceberg At Adobe

2020-12-03 Thread Saisai Shao
Great, thanks for sharing! Best Saisai libis 于2020年12月4日周五 上午10:35写道: > Nice, thanks for sharing > > Jacques Nadeau 于2020年12月4日周五 上午3:58写道: > >> Yeah, thanks for sharing >> >> On Thu, Dec 3, 2020 at 11:57 AM John Zhuge wrote: >> >>> Very nice! >>> >>> On Thu, Dec 3, 2020 at 10:36 AM Miao Wang

Re: Suggested S3 FileIO/Getting Started

2020-11-15 Thread Saisai Shao
opFileIO and a > FileSystem works great and is probably the easiest way to maintain an > implementation for the object store. > > On Thu, Nov 12, 2020 at 7:31 PM Saisai Shao > wrote: > >> Hi all, >> >> Sorry to chime in, I also have a same concern about using Iceberg

Re: Suggested S3 FileIO/Getting Started

2020-11-12 Thread Saisai Shao
Hi all, Sorry to chime in, I also have a same concern about using Iceberg with Object storage. One of my concerns with S3FileIO is getting tied too much to a single > cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful > so that S3FileIO and (a future) GCSFileIO could share log

Re: Question about Iceberg release cadence

2020-08-27 Thread Saisai Shao
Would like to get structured streaming reader in in the next release :). Will spend time on addressing new feedbacks. Thanks Saisai Mass Dosage 于2020年8月27日周四 下午10:36写道: > I'm all for a release. The only thing still required for basic Hive read > support (other than documentation of course!) is

Re: [DISCUSS] Rename iceberg-hive module?

2020-08-20 Thread Saisai Shao
+1 for the changes. Mass Dosage 于2020年8月20日周四 下午5:46写道: > +1 for `iceberg-hive-metastore` as I found this confusing when I first > started working with the code. > > On Thu, 20 Aug 2020 at 03:27, Jungtaek Lim > wrote: > >> +1 for `iceberg-hive-metastore` and also +1 for RD's proposal. >> >> Tha

Re: New committer: Shardul Mahadik

2020-07-22 Thread Saisai Shao
Congrats! Thanks Saisai OpenInx 于2020年7月23日周四 上午10:06写道: > Congratulations ! > > On Thu, Jul 23, 2020 at 9:31 AM Jingsong Li > wrote: > >> Congratulations Shardul! Well deserved! >> >> Best, >> Jingsong >> >> On Thu, Jul 23, 2020 at 7:27 AM Anton Okolnychyi >> wrote: >> >>> Congrats and welco

Re: Iceberg sync notes - 17 & 19 June

2020-06-22 Thread Saisai Shao
Hi team, Any plan to get this Structured Streaming Read support ( https://github.com/apache/iceberg/pull/796) in 0.9.0 release? Would be appreciated anyone can take a review. Thanks! Best regards, Saisai Ryan Blue 于2020年6月23日周二 上午6:42写道: > Hi everyone, > > I just posted my notes from the comm

Re: [VOTE] Graduate to a top-level project

2020-05-12 Thread Saisai Shao
+1 for graduation. Junjie Chen 于2020年5月13日周三 上午9:33写道: > +1 > > On Wed, May 13, 2020 at 8:07 AM RD wrote: > >> +1 for graduation! >> >> On Tue, May 12, 2020 at 3:50 PM John Zhuge wrote: >> >>> +1 >>> >>> On Tue, May 12, 2020 at 3:33 PM parth brahmbhatt < >>> brahmbhatt.pa...@gmail.com> wrote:

Re: [Discuss] Merge spark-3 branch into master

2020-04-21 Thread Saisai Shao
> > On 2020/03/27 01:53:09, Saisai Shao wrote: > > Thanks Ryan, let me take a try.> > > > > Best regards,> > > Saisai> > > > > Ryan Blue 于2020年3月27日周五 上午12:15写道:> > > > > > Here’s how it was done before:> > > > > h

Re: [Discuss] Merge spark-3 branch into master

2020-03-26 Thread Saisai Shao
>> the iceberg-hive and iceberg-mr modules so that they aren't locked to the >> same versions as the rest of the projects. >> >> On Thu, 26 Mar 2020 at 01:53, Saisai Shao wrote: >> >>> Hi Ryan, >>> >>> As mentioned in the meeting, would

Re: [Discuss] Merge spark-3 branch into master

2020-03-25 Thread Saisai Shao
take a lot of time. I know in our > company we have no near term plans to move to Spark 3. > > -Best, > R. > > On Thu, Mar 5, 2020 at 6:33 PM Saisai Shao wrote: > >> I was thinking that if it is possible to limit version lock plugin to >> only iceberg core related

Re: Shall we start a regular community sync up?

2020-03-18 Thread Saisai Shao
5pm PST in any day works for me. Looking forward to it. Thanks Saisai

Shall we start a regular community sync up?

2020-03-18 Thread Saisai Shao
Hi team, With more companies and developers joining in the community, I was wondering if we could have regular sync up to discuss anything about Iceberg, like milestone, feature design, etc. I think this will be quite helpful to grow the community and move forward the project. Would like to hear

Re: [Discuss] Merge spark-3 branch into master

2020-03-05 Thread Saisai Shao
ity? Any suggestions on this? Best regards, Saisai Saisai Shao 于2020年3月5日周四 下午3:12写道: > I think the requirement of supporting different version should be quite > common. As Iceberg is a table format which should be adapted to different > engines like Hive, Flink, Spark. To support diff

Re: [Discuss] Merge spark-3 branch into master

2020-03-04 Thread Saisai Shao
ditional checks that > baseline provides in general since this is a short-term problem. It would > just be nice if we could have versions that are confined to a single > module. The Nebula plugin that baseline uses claims to support that, but I > couldn't get it to work. >

Re: [Discuss] Merge spark-3 branch into master

2020-03-04 Thread Saisai Shao
an be differentiated by names when generating jars, also they will not be relied by other modules in Iceberg. So this dependency issue should not be the case here. And in Maven it could be achieved easily. Please correct me if wrong. Best regards, Saisai Saisai Shao 于2020年3月4日周三 上午10:01写道: > Thanks Matt

Re: [Discuss] Merge spark-3 branch into master

2020-03-03 Thread Saisai Shao
gt; > I would think that branching would be the best way to build and publish > against multiple versions of a dependency. > > > > -Matt Cheah > > > > *From: *Saisai Shao > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Tuesday, March 3, 2020 at

Re: [Discuss] Merge spark-3 branch into master

2020-03-03 Thread Saisai Shao
you can get it working, I think it's a >> great idea to get this into master. >> >> Otherwise, I was thinking about proposing an 0.8.0 release in the next >> month or so based on Spark 2.4. Then we could merge the branch into master >> and do another release

[Discuss] Merge spark-3 branch into master

2020-03-03 Thread Saisai Shao
Hi team, I was thinking of merging spark-3 branch into master, also per the discussion before we could make spark-2 and spark-3 coexisted into 2 different sub-modules. With this, one build could generate both spark-2 and spark-3 runtime jars, user could pick either at preference. One concern is t

Re: Iceberg in Spark 3.0.0

2019-11-24 Thread Saisai Shao
wrote: > >> +1 for Iceberg branch >> >> Thanks for the contribution from you and your team! >> >> On Fri, Nov 22, 2019 at 8:29 AM Anton Okolnychyi >> wrote: >> >>> +1 on having a branch in Iceberg as we have for vectorized reads. >&g

Re: Query about the semantics of "overwrite" in Iceberg

2019-11-24 Thread Saisai Shao
is fixed in 3.0 because Spark will choose its behavior and correctly > configure the source with a dynamic overwrite or an overwrite using an > expression. > > On Thu, Nov 21, 2019 at 11:33 PM Saisai Shao > wrote: > >> Hi Team, >> >> I found that Iceberg's &

Query about the semantics of "overwrite" in Iceberg

2019-11-21 Thread Saisai Shao
Hi Team, I found that Iceberg's "overwrite" is different from Spark's built-in sources like Parquet. The "overwrite" semantics in Iceberg seems more like "upsert", but not deleting the partitions where new data doesn't contain. I would like to know what is the purpose of such design choice? Also

Re: Iceberg in Spark 3.0.0

2019-11-21 Thread Saisai Shao
y, so that they community could review and contribute on it. I would like to hear your suggestions. Best regards, Saisai Ryan Blue 于2019年11月20日周三 上午1:27写道: > Sounds great, thanks Saisai! > > On Mon, Nov 18, 2019 at 3:29 AM Saisai Shao > wrote: > >> Thanks Anton, I

Re: Iceberg in Spark 3.0.0

2019-11-18 Thread Saisai Shao
t; > - Anton > > > On 18 Nov 2019, at 02:08, Saisai Shao wrote: > > Hi Anton, > > Thanks to bring this out. We already have a branch building against Spark > 3.0 (Master branch actually) internally, and we're actively working on it. > I think it is a good idea to create

Re: Iceberg in Spark 3.0.0

2019-11-17 Thread Saisai Shao
Hi Anton, Thanks to bring this out. We already have a branch building against Spark 3.0 (Master branch actually) internally, and we're actively working on it. I think it is a good idea to create an upstream Spark 3.0 branch, we could share it if the community would like to do so. Best regards, Sa

Question about schema evolution and partition spec evolution

2019-09-05 Thread Saisai Shao
Hi team, I have some newbie questions about schema evolution and partition evolution. From the design spec, Iceberg supports schema evolution and partition spec evolution, my questions are: 1. If a new column is added, are we going to rewrite the whole data, if not how do we support it? 2. Do we

Re: New committer and PPMC member, Anton Okolnychyi

2019-09-02 Thread Saisai Shao
Congrats Anton! Best regards, Saisai Daniel Weeks 于2019年9月3日周二 上午7:48写道: > Congrats Anton! > > On Fri, Aug 30, 2019 at 1:54 PM Edgar Rodriguez > wrote: > >> Nice! Congratulations, Anton! >> >> Cheers, >> >> On Fri, Aug 30, 2019 at 1:42 PM Dongjoon Hyun >> wrote: >> >>> Congratulations, Anton!

Re: Are we going to use Apache JIRA instead of Github issues

2019-08-18 Thread Saisai Shao
rch (in my opinion). I'm just one vote, though, so if most people >> prefer to move to JIRA I'm open to it. >> >> What do you think is missing compared to JIRA? >> >> On Fri, Aug 16, 2019 at 3:09 AM Saisai Shao >> wrote: >> >>> Hi Team, &

Are we going to use Apache JIRA instead of Github issues

2019-08-16 Thread Saisai Shao
Hi Team, Seems Iceberg project uses Github issues instead of JIRA. IMHO JIRA is more powerful and easy to manage, most of the Apache projects use JIRA to track everything, any plan to move to JIRA or we stick on using Github issues? Thanks Saisai

Re: Any plan to support update, delete and others

2019-08-08 Thread Saisai Shao
e, Spark supports the latest features available in DataSourceV2, > and will continue to. In fact, we're adding features to DSv2 based on what > we've built internally at Netflix to support Iceberg. > > On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao wrote: > >> Thanks a lo

Re: Two newbie question about Iceberg

2019-08-08 Thread Saisai Shao
gt; missing automatically? > > On Wed, Aug 7, 2019 at 7:13 PM Saisai Shao wrote: > >> Thanks guys for your reply. >> >> I didn't do anything special, I don't even have a configured Hive. I just >> simply put the iceberg (assembly) jar into Spark and start a local

Re: Iceberg in Spark 3.0.0

2019-08-07 Thread Saisai Shao
IMHO I agree that we should have a branch to track the changes for Spark 3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would be better to evaluate the changes and do the design by also considering 3.0 changes. My two cents :) Best regards, Saisai Edgar Rodriguez 于2019年8月

Re: Two newbie question about Iceberg

2019-08-07 Thread Saisai Shao
der if a newer version of Hive would avoid this problem? What version > are you linking with? > > > > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao > wrote: > > Hi team, > > > > I just met some issues when trying Iceberg with quick start guide. Not > sure if it is pr

Re: Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
y to call this yet. Same for > MERGE > INTO, open source Spark doesn’t support the operation yet. We’re also > working on building support into Spark as we go. > > I hope that helps! > > On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao wrote: > >> Hi team, >> >&

Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
Hi team, Delta lake project recently announced version 0.3.0, which added several new features in API level, like update, delete, merge, vacuum, etc. May I ask is there any plan to add such features in Iceberg? Thanks Saisai

Two newbie question about Iceberg

2019-08-06 Thread Saisai Shao
Hi team, I just met some issues when trying Iceberg with quick start guide. Not sure if it is proper to send this to @dev mail list (seems there's no user mail list). One issue is that seems current Iceberg cannot run with embedded metastore. It will throw an exception. Is this an on-purpose beha