introductory Iceberg blog post

2021-01-28 Thread Mass Dosage
Hello all, As you may be aware Expedia Group helped contribute Hive read support to Iceberg last year. We finally got around to publishing a blog post about this which also includes an overview of Iceberg and why we think it's so useful. If you're interested you can read it here: https://medium.c

How to get table/partition creation time/update time in iceberg

2021-01-28 Thread chong luo
Hi Iceberg Devs I’m currently working on delete expired table and partition in iceberg. However, I can not find table/partition creation time, it seems iceberg only stores snapshot creation time. In hive, transient_lastDdlTime, createTime and lastAccessTime are stored in metastore. With time m

Newbie iceberg questions

2021-01-28 Thread kkishore iiith
Hello Community, I am solving the problem of handling late arrived data in one of our systems. Currently, we wait for 8 hours for the late data to arrive before starting processing the current hour data. We have three stages in our pipeline A -> B -> C where B waits for 8 hours for A's hourly dat

Re: Newbie iceberg questions

2021-01-28 Thread Ryan Blue
Replies inlin. On Thu, Jan 28, 2021 at 9:42 AM kkishore iiith wrote: > Hello Community, > > I am solving the problem of handling late arrived data in one of our > systems. Currently, we wait for 8 hours for the late data to arrive before > starting processing the current hour data. > > We have t

Re: How to get table/partition creation time/update time in iceberg

2021-01-28 Thread Ryan Blue
Chong, Once snapshots expire, I don't think that there is a way to recover the time that a given partition was created. Can you explain more about what you're trying to do? When we age off data, we use the age of the records themselves, not the age from metadata. In other words, we use the logica

Re: introductory Iceberg blog post

2021-01-28 Thread Ryan Blue
Thanks for sharing this, Adrian! On Thu, Jan 28, 2021 at 1:54 AM Mass Dosage wrote: > Hello all, > > As you may be aware Expedia Group helped contribute Hive read support to > Iceberg last year. We finally got around to publishing a blog post about > this which also includes an overview of Icebe

Re: introductory Iceberg blog post

2021-01-28 Thread Jack Ye
I have added it to the blog page PR: https://github.com/apache/iceberg/pull/2177 -Jack On Thu, Jan 28, 2021 at 10:46 AM Ryan Blue wrote: > Thanks for sharing this, Adrian! > > On Thu, Jan 28, 2021 at 1:54 AM Mass Dosage wrote: > >> Hello all, >> >> As you may be aware Expedia Group helped contr

Re: introductory Iceberg blog post

2021-01-28 Thread Mass Dosage
Ah great, I wasn't aware there was such a thing, thank you Jack! On Thu, 28 Jan 2021 at 20:19, Jack Ye wrote: > I have added it to the blog page PR: > https://github.com/apache/iceberg/pull/2177 > -Jack > > On Thu, Jan 28, 2021 at 10:46 AM Ryan Blue > wrote: > >> Thanks for sharing this, Adrian

Sync to discuss secondary index proposal

2021-01-28 Thread Ryan Blue
Hi everyone, The proposal that Miao wrote about secondary indexes has come up a lot lately. I think it would be a good time to have a discussion about the proposal and set some initial goals for what we want to do next. Since there hasn't been much discussion on the dev list, I'll schedule a sync

Re: Sync to discuss secondary index proposal

2021-01-28 Thread Russell Spitzer
CST Please :) But I don’t mind waking up early or staying up late as required > On Jan 28, 2021, at 4:14 PM, Ryan Blue wrote: > > Hi everyone, > > The proposal that Miao wrote about secondary indexes has come up a lot > lately. I think it would be a good time to have a discussion about the >

Re: Bucketed Joins on Iceberg

2021-01-28 Thread Ryan Blue
Hi Romin, Spark has poor support for bucketed joins and we have a design doc to hopefully improve that. We talked about this yesterday at the community sync. One of the parts that we also need to get into Spark

Re: Sync to discuss secondary index proposal

2021-01-28 Thread Jack Ye
+1, looking forward to the discussion, please include me and Yan ( yyany...@gmail.com), also in PST. -Jack On Thu, Jan 28, 2021 at 2:16 PM Russell Spitzer wrote: > CST Please :) But I don’t mind waking up early or staying up late as > required > > On Jan 28, 2021, at 4:14 PM, Ryan Blue wrote: >

Re: Sync to discuss secondary index proposal

2021-01-28 Thread Xinli shang
I had some earlier discussion with Miao on this. I am still interested in it. My time zone is PST. On Thu, Jan 28, 2021 at 2:50 PM Jack Ye wrote: > +1, looking forward to the discussion, please include me and Yan ( > yyany...@gmail.com), also in PST. > -Jack > > On Thu, Jan 28, 2021 at 2:16 PM R

Re: Sync to discuss secondary index proposal

2021-01-28 Thread OpenInx
+1, my time zone is CST. On Fri, Jan 29, 2021 at 6:57 AM Xinli shang wrote: > I had some earlier discussion with Miao on this. I am still interested in > it. My time zone is PST. > > On Thu, Jan 28, 2021 at 2:50 PM Jack Ye wrote: > >> +1, looking forward to the discussion, please include me an

Re: Sync to discuss secondary index proposal

2021-01-28 Thread 李响
+1, my colleagues and I is at UTC+8 On Fri, Jan 29, 2021 at 9:50 AM OpenInx wrote: > +1, my time zone is CST. > > On Fri, Jan 29, 2021 at 6:57 AM Xinli shang > wrote: > >> I had some earlier discussion with Miao on this. I am still interested in >> it. My time zone is PST. >> >> On Thu, Jan 28

Re: Sync to discuss secondary index proposal

2021-01-28 Thread OpenInx
Hi @Miao WangWould you mind to share your current PoC code or PR for this document [1] if possible ? I'd like to understand more details before I get involved in this discussion. Thanks. [1]. https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit?ts=601316b0

Re: Sync to discuss secondary index proposal

2021-01-28 Thread OpenInx
Sorry I sent the wrong link, the secondary index document link is: https://docs.google.com/document/d/1E1ofBQoKRnX04bWT3utgyHQGaHZoelgXosk_UNsTUuQ/edit On Fri, Jan 29, 2021 at 10:31 AM OpenInx wrote: > Hi > > @Miao WangWould you mind to share your current PoC > code or PR for this documen

Re: Sync to discuss secondary index proposal

2021-01-28 Thread Miao Wang
Hi @OpenInx, The code change is based on our internal fork. We need to some refactoring before sending out an open source PR. In addition, since there is no spec defined in Iceberg, the implementation is coupled closely to our code base. That is one of the major reason

Re: Bucketed Joins on Iceberg

2021-01-28 Thread Romin Parekh
Thanks Ryan. I was unable to attend the community sync but a few of my colleagues did. We are discussing next steps internally and are also open to contributing. Thanks, Romin On Thu, Jan 28, 2021 at 2:20 PM Ryan Blue wrote: > Hi Romin, > > Spark has poor support for bucketed joins and we have