Re: [ANNOUNCE] Apache Iceberg release 1.6.1

2024-09-25 Thread Wing Yew Poon
I do not see release notes for 1.6.1. Shouldn't https://iceberg.apache.org/releases/ have a section for 1.6.1 and highlights of the changes? (And for 1.6.1 to show up in the Table of contents on the right?) On Wed, Aug 28, 2024 at 8:34 AM Carl Steinbach wrote: > I'm pleased to announce the rele

Re: clarification on changelog behavior for equality deletes

2024-08-22 Thread Wing Yew Poon
Just a note that the functionality to compute net changes was added by Yufei only in Iceberg 1.4.0, in #7326 <https://github.com/apache/iceberg/pull/7326>. On Thu, Aug 22, 2024 at 12:48 PM Wing Yew Poon wrote: > Peter, > > The Spark procedure is implemented by CreateChangelogVie

Re: clarification on changelog behavior for equality deletes

2024-08-22 Thread Wing Yew Poon
nd compute update >> cannot be used together. >> >> Thanks, >> Steve Zhang >> >> >> >> On Aug 22, 2024, at 8:50 AM, Steven Wu wrote: >> >> > It should emit changes for each snapshot in the requested range. >> >> Wing Yew ha

Re: clarification on changelog behavior for equality deletes

2024-08-22 Thread Wing Yew Poon
;>> >>>> I agree that option (a) is what user expects for row level changes. >>>> >>>> I feel the added deletes in given snapshots provides a PK of DELETED >>>> entry, existing deletes are used to read together with data files to find >>>

clarification on changelog behavior for equality deletes

2024-08-20 Thread Wing Yew Poon
Hi, I have a PR open to add changelog support for the case where delete files are present (https://github.com/apache/iceberg/pull/10935). I have a question about what the changelog should emit in the following scenario: The table has a schema with a primary key/identifier column PK and additional

Re: Dropping JDK 8 support

2024-07-23 Thread Wing Yew Poon
I just wish to point out that when people started voting, the proposal was "dropping JDK 8 support in Iceberg 2.0 release". It's fine for people to propose dropping JDK8 support sooner than that (and I'm not against that), but the proposal being voted on should not be switched mid-vote. - Wing Yew

Re: [DISCUSS][BYLAWS] Moving forward on the bylaws

2024-07-19 Thread Wing Yew Poon
Hi Owen, Thanks for doing this. Once you have the questions and choices, who gets to vote on them? - Wing Yew On Fri, Jul 19, 2024 at 10:07 AM Owen O'Malley wrote: > All, >Sorry for the long pause on bylaws discussion. It was a result of > wanting to avoid the long US holiday week (July 4th

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2024-07-09 Thread Wing Yew Poon
. > > There's also progress in supporting DELETE/UPDATE/MERGE from Dataframes as > well, it should also be coming soon in Spark. > > Thanks, > Szehon > > > > On Wed, Jul 26, 2023 at 12:46 PM Wing Yew Poon > wrote: > >> We are talking about DELETE/UPDATE/MER

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-09 Thread Wing Yew Poon
I am not familiar with the GitHub discussion feature and do not have an opinion about using it. I do think though that it would be useful to have a user list as well as a dev list for Apache Iceberg. Many Apache projects have both. Discussions about project work should continue to happen on the dev

Re: Iceberg Materialized View Meeting

2024-06-04 Thread Wing Yew Poon
Can you please record the meeting and make the recording available afterwards? Thanks, Wing Yew On Mon, Jun 3, 2024 at 11:32 PM Benny Chow wrote: > Thanks for organizing Jan. I’ll be there! > > Benny > > On Jun 3, 2024, at 11:15 PM, Jan Kaul wrote: > >  > > Hi all, > > we will have a video

Re: spec question on equality deletes

2024-04-15 Thread Wing Yew Poon
, 2024 at 5:25 PM Yufei Gu wrote: > Hi Wing Yew Poon, > > Here is my understanding, but not necessarily how an engine implements it. > It should only consider the columns in equality_ids when we apply eq > deletes. Also the engine should ignore the unrelated columns. > It will st

Re: spec question on equality deletes

2024-04-15 Thread Wing Yew Poon
s any row where the delete columns are equal. Multiple > columns can be thought of as an AND of equality predicates." That could > be interpreted to mean (c). > > > > Whether it’s incorrect depends on how the compute engine works. If the > compute engine doesn’t try to p

spec question on equality deletes

2024-04-12 Thread Wing Yew Poon
Hi, I have some questions on the current Iceberg spec regarding equality deletes: https://iceberg.apache.org/spec/#equality-delete-files The spec says that for "a table with the following data: 1: id | 2: category | 3: name

Re: Community Meeting Minutes ?

2023-12-08 Thread Wing Yew Poon
ere hit by a series of family > and medical issues so apologies. I will put some better backups into place > in the unlikely event we are both out of commission. > > Thanks for the push and stand by for the meeting minutes. > > On Wed, Dec 6, 2023 at 3:06 PM Wing Yew Poon > wr

Re: Community Meeting Minutes ?

2023-12-06 Thread Wing Yew Poon
The meeting minutes and a link to the recording used to be sent out to this list regularly soon after the community sync. I have not been able to attend the sync recently and I haven't seen the minutes for the last two syncs. Can we please maintain the practice of sending the minutes and recording

Re: Is there a way to distcp iceberg table from hadoop?

2023-12-02 Thread Wing Yew Poon
Aren't we forgetting about position delete files? If the table has position delete files, then those contain absolute file paths as well. We cannot add them to the table as-is. We need to rewrite them. This, I think, is the most painful part of replicating an Iceberg table. - Wing Yew On Sat, Dec

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-26 Thread Wing Yew Poon
need is to override > table configuration, then write options are the right way to do it. > > On Wed, Jul 26, 2023 at 10:10 AM Wing Yew Poon > wrote: > >> I was on vacation. >> Currently, write modes (copy-on-write/merge-on-read) can only be set as >> table properti

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-26 Thread Wing Yew Poon
hing beyond that, I think we need to discuss what you're trying to do. > If it's to override a table-level setting with a SQL global, then we should > understand the use case better. > > On Fri, Jul 14, 2023 at 6:09 PM Wing Yew Poon > wrote: > >> Also, in the case o

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-14 Thread Wing Yew Poon
Also, in the case of write mode (I mean write.delete.mode, write.update.mode, write.merge.mode), these cannot be set as options currently; they are only settable as table properties. On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon wrote: > I think that different use cases benefit from or e

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-07-14 Thread Wing Yew Poon
ditional learning efforts to Spark users and > how can Spark administrators set them at cluster level? > > > > Thanks, > > Cheng Pan > > > > > > > > > >> On Jun 17, 2023, at 04:01, Wing Yew Poon > wrote: > >> > >> Hi, > >> I rec

allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-16 Thread Wing Yew Poon
Hi, I recently put up a PR, https://github.com/apache/iceberg/pull/7790, to allow the write mode (copy-on-write/merge-on-read) to be specified in SQLConf. The use case is explained in the PR. Cheng Pan has an open PR, https://github.com/apache/iceberg/pull/7733, to allow locality to be specified in

Re: rewrite action for collate how can we pass date range?

2023-05-24 Thread Wing Yew Poon
Gaurav, Is your data partitioned by date? If so, you can compact subsets of partitions at a time. To do this using the Spark procedure, you pass a where clause: spark.sql("CALL catalog_name.system.rewrite_data_files(table => '...', where => '...')") If you use the RewriteDataFilesSparkAction, yo

Re: Welcome new committers and PMC!

2023-05-03 Thread Wing Yew Poon
Congratulations, Amogh, Eduard and Szehon! Well deserved! On Wed, May 3, 2023 at 12:07 PM Ryan Blue wrote: > Hi everyone, > > I want to congratulate Amogh and Eduard, who were just added as Ierberg > committers and Szehon, who was just added to the PMC. Thanks for all your > contributions! > >

Re: Proposal - Priority based commit ordering on partitions

2022-10-03 Thread Wing Yew Poon
Prashant, just saw Jack's post mentioning that you're in India Time. Obviously day time Pacific is not convenient for you. I'm fine with 9 pm Pacific. On Mon, Oct 3, 2022 at 12:09 PM Wing Yew Poon wrote: > Hi Prashant, > I am very interested in this proposal and woul

Re: Proposal - Priority based commit ordering on partitions

2022-10-03 Thread Wing Yew Poon
Hi Prashant, I am very interested in this proposal and would like to attend this meeting. Friday October 7 is fine with me; I can do 9 pm Pacific Time if that is what works for you (I don't know what time zone you're in), although any time between 2 and 6 pm would be more convenient. Thanks, Wing Y

Re: Welcome Yufei Gu as a committer

2022-08-25 Thread Wing Yew Poon
Congratulations, Yufei! On Thu, Aug 25, 2022 at 4:23 PM Sam Redai wrote: > Congrats Yufei! 🎉 > > On Thu, Aug 25, 2022 at 7:20 PM Anton Okolnychyi > wrote: > >> I’d like to welcome Yufei Gu as a committer to the project. >> >> Thanks for all your hard work, Yufei! >> >> - Anton > > -- > > Sam R

Re: Problem with partitioned table creation in scala

2022-05-27 Thread Wing Yew Poon
The partitionedBy typo in the doc is already fixed in the master branch of the Iceberg repo. I filed a PR to add `using("iceberg")` to the `writeTo` examples for creating a table (if you want to create an *Iceberg* table). On Fri, May 27, 2022 at 12:58 PM Wing Yew Poon wrote: > O

Re: Problem with partitioned table creation in scala

2022-05-27 Thread Wing Yew Poon
t;ts")) .createOrReplace() - Wing Yew On Fri, May 27, 2022 at 11:29 AM Wing Yew Poon wrote: > That is a typo in the sample code. The doc itself ( > https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables) > says: > "Create and replace operations support tab

Re: Problem with partitioned table creation in scala

2022-05-27 Thread Wing Yew Poon
That is a typo in the sample code. The doc itself ( https://iceberg.apache.org/docs/latest/spark-writes/#creating-tables) says: "Create and replace operations support table configuration methods, like partitionedBy and tableProperty" You could also have looked up the API in Spark documentation: htt

Re: Hive 4.0.0-alpha-1 release is available with Iceberg integration

2022-04-07 Thread Wing Yew Poon
Congratulations on the release, making available this functionality for Hive users! On Thu, Apr 7, 2022 at 9:11 AM Peter Vary wrote: > Hi Team, > > I would like to let you know that the Hive team released Hive > 4.0.0-alpha-1. > > Using this release it is possible to create, read, write Iceberg

Re: Welcome Szehon Ho as a committer!

2022-03-11 Thread Wing Yew Poon
Congratulations Szehon! On Fri, Mar 11, 2022 at 3:42 PM Sam Redai wrote: > Congrats Szehon! > > On Fri, Mar 11, 2022 at 6:41 PM Yufei Gu wrote: > >> Congratulations Szehon! >> Best, >> >> Yufei >> >> `This is not a contribution` >> >> >> On Fri, Mar 11, 2022 at 3:36 PM Ryan Blue wrote: >> >>>

Re: Welcome new PMC members!

2021-11-17 Thread Wing Yew Poon
Congratulations Jack and Russell! Well done, and well deserved. - Wing Yew On Wed, Nov 17, 2021 at 4:13 PM Kyle Bendickson wrote: > Congratulations to both Jack and Russell! > > Very we deserved indeed :) > > On Wed, Nov 17, 2021 at 4:12 PM Ryan Blue wrote: > >> Hi everyone, I want to welcome

publish snapshot to maven workflow

2021-11-08 Thread Wing Yew Poon
Hi, I know that there is a github workflow to publish snapshot to maven. This workflow fails in my fork of the Iceberg repo (I imagine because I don't have permissions). How are folks dealing with this? I just don't need to receive daily emails that the workflow failed. Thanks, Wing Yew

Re: Identifying the schema of an Iceberg snapshot

2021-11-08 Thread Wing Yew Poon
The fallback logic I mentioned will be in core Iceberg. On Mon, Nov 8, 2021 at 9:35 AM Wing Yew Poon wrote: > There is logic needed in both core Iceberg (in BaseTableScan and > DataTableScan) and in each engine. > > > On Mon, Nov 8, 2021 at 9:17 AM Vivekanand Vellanki >

Re: Identifying the schema of an Iceberg snapshot

2021-11-08 Thread Wing Yew Poon
be part of Iceberg APIs? > Basically, the Snapshot object has an API that returns the schema of the > snapshot. > > On Mon, Nov 8, 2021 at 10:24 PM Wing Yew Poon > wrote: > >> I am surprised that schema-id is optional for a v2 snapshot. >> I believe that the implemen

Re: Identifying the schema of an Iceberg snapshot

2021-11-08 Thread Wing Yew Poon
I am surprised that schema-id is optional for a v2 snapshot. I believe that the implementation now already writes a schema-id for both v1 and v2 snapshots. Of course, snapshots written before schema-id was added do not have it. I am working on implementing using the appropriate schema when reading

Re: Standard practices around PRs against multiple Spark versions

2021-11-03 Thread Wing Yew Poon
I wasn't aware that we were standardizing on such a practice. I don't have a strong opinion on making changes one Spark version at a time or all at once. I think committers who do reviews regularly should decide. My only concern with making changes one version at a time is follow-through on the par

Re: Meeting Minutes from 10/20 Iceberg Sync

2021-10-26 Thread Wing Yew Poon
> >>> If I remember correctly, we landed on option 1, creating a v3.1 without >>> the extra reflection logic and then just deprecating 3.0 when the time >>> comes. If everyone agrees with that I can amend the notes to describe that >>> more explicitly. >

Re: Meeting Minutes from 10/20 Iceberg Sync

2021-10-26 Thread Wing Yew Poon
n just deprecating 3.0 when the time > comes. If everyone agrees with that I can amend the notes to describe that > more explicitly. > > -Sam > > On Mon, Oct 25, 2021 at 11:30 AM Wing Yew Poon > wrote: > >> Adding v3.2 to Spark Build Refactoring >>> >>>

Re: Meeting Minutes from 10/20 Iceberg Sync

2021-10-25 Thread Wing Yew Poon
> > Adding v3.2 to Spark Build Refactoring > >- > >Russell and Anton will coordinate on dropping in a Spark 3.2 module >- > >We currently have 3.1 in the `spark3` module. We’ll move that out to >its own module and mirror what we do with the 3.2 module. (This will enable >cle

Re: Help improve Iceberg community meeting experience

2021-10-22 Thread Wing Yew Poon
I have no concerns with Tabular hosting and recording the meetings. I'm in favor of having the meetings recorded and the recordings available. - Wing Yew On Fri, Oct 22, 2021 at 1:59 PM John Zhuge wrote: > +1 > > It will be great to catch up on the meetings missed. > > On Fri, Oct 22, 2021 at 1

Re: [DISCUSS] Spark version support strategy

2021-09-28 Thread Wing Yew Poon
hub.com/apache/iceberg/issues/3183 > [3] > https://lists.apache.org/x/thread.html/ra438e89eeec2d4623a32822e21739c8f2229505522d73d1034e34198@%3Cdev.flink.apache.org%3E > > > On Wed, Sep 29, 2021 at 5:27 AM Wing Yew Poon > wrote: > >> In the last community sync, we spent a li

Re: [DISCUSS] Spark version support strategy

2021-09-28 Thread Wing Yew Poon
then develop a mechanism to vote to stop support of certain >>>> versions, and archive the corresponding directory to avoid accumulating too >>>> many versions in the long term. >>>> >>>> -Jack Ye >>>> >>>> >>>> On

Re: [DISCUSS] Spark version support strategy

2021-09-15 Thread Wing Yew Poon
to push for user upgrade, as it will make the life >>>> of both parties easier in the end. New feature is definitely one of the >>>> best incentives to promote an upgrade on user side. >>>> >>>> I think the biggest issue of option 3 is about its s

Re: [DISCUSS] Spark version support strategy

2021-09-14 Thread Wing Yew Poon
I understand and sympathize with the desire to use new DSv2 features in Spark 3.2. I agree that Option 1 is the easiest for developers, but I don't think it considers the interests of users. I do not think that most users will upgrade to Spark 3.2 as soon as it is released. It is a "minor version"

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Wing Yew Poon
https://github.com/apache/iceberg/pull/2954 should be ready to merge. The CI passed. On Mon, Aug 9, 2021 at 9:08 AM Wing Yew Poon wrote: > Ryan, > Thanks for the review. Let me look into implementing your refactoring > suggestion. > - Wing Yew > > > On Mon, Aug 9, 2021

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Wing Yew Poon
21 at 2:52 PM Carl Steinbach > wrote: > >> Hi Wing Yew, >> >> I will create a new RC once this patch is committed. >> >> Thanks. >> >> - Carl >> >> On Sat, Aug 7, 2021 at 4:29 PM Wing Yew Poon >> wrote: >> >>> Sorry to bri

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-07 Thread Wing Yew Poon
Sorry to bring this up so late, but this just came up: there is a Spark 3.1 (runtime) compatibility issue (not found by existing tests), which I have a fix for in https://github.com/apache/iceberg/pull/2954. I think it would be really helpful if it can go into 0.12.0. - Wing Yew On Fri, Aug 6, 20

Re: Welcoming Jack Ye as a new committer!

2021-07-05 Thread Wing Yew Poon
Congratulations Jack! On Mon, Jul 5, 2021 at 11:35 AM Ryan Blue wrote: > Hi everyone, > > I'd like to welcome Jack Ye as a new Iceberg committer. > > Thanks for all your contributions, Jack! > > Ryan > > -- > Ryan Blue >

Re: Compaction Sync - Monday

2021-04-19 Thread Wing Yew Poon
Russell, Can you please add me too? Thanks, Wing Yew On Mon, Apr 19, 2021 at 9:01 AM Russell Spitzer wrote: > I officially moved the meeting to tonight 6PM (Pacific) or tomorrow > morning 9AM (China ST) or 8PM (Central) - > We all knew timezones were going to be the hard part of computer scienc

Re: Welcoming Yan Yan as a new committer!

2021-03-24 Thread Wing Yew Poon
Congratulations Yan! On Wed, Mar 24, 2021 at 1:36 PM Ryan Murray wrote: > Congratulations!! > > On Wed, 24 Mar 2021, 11:39 Szehon Ho, wrote: > >> Nice, congratulations! >> >> On 24 Mar 2021, at 11:37, Marton Bod wrote: >> >> Congratulations, well done! >> >> On Wed, 24 Mar 2021 at 11:32, Pete

Re: Welcoming Peter Vary as a new committer!

2021-01-25 Thread Wing Yew Poon
Congratulations Peter! On Mon, Jan 25, 2021 at 10:35 AM Russell Spitzer wrote: > Congratulations! > > On Jan 25, 2021, at 12:34 PM, Jacques Nadeau > wrote: > > Congrats Peter! Thanks for all your great work > > On Mon, Jan 25, 2021 at 10:24 AM Ryan Blue wrote: > >> Hi everyone, >> >> I'd like

Re: About schema evolution with time travel.

2020-12-14 Thread Wing Yew Poon
Hi Tianyi, The behavior you found is indeed the current behavior in Iceberg. I too found it unexpected. I have a PR to address this: https://github.com/apache/iceberg/pull/1508. Due to other work, I had not followed up on this for a while, but I am returning to it now. - Wing Yew On Mon, Dec 14,

Re: Shall we start a regular community sync up?

2020-12-01 Thread Wing Yew Poon
I'd like to attend the community syncs as well. Can you please send me an invite? Thanks, Wing Yew Poon On Thu, Nov 19, 2020 at 9:25 PM Chitresh Kakwani wrote: > Hi Ryan, > > Could you please add me to the invitation list as well ? New entrant. > Interested in Iceberg'