Duplicates are getting inserted into Iceberg tables even after de-duplication

2024-03-13 Thread Shwetha Dharmarajan
Hello, We are using Apache Iceberg with AWS Glue. We are seeing an issue where duplicates are getting inserted into the table, even after making sure there are no duplicates in the data being upserted into the table. We use MERGE sql to upsert data into the table. We also see an issue where dup

Re: [PROPOSAL] Improvement on our PR flows

2024-03-13 Thread Renjie Liu
Hi, JB: Your proposal looks great to me. We should definitely have a vote for a proposal impacting the spec, and the model is great. On Tue, Mar 12, 2024 at 10:55 PM Jean-Baptiste Onofré wrote: > Hi > > I think a vote would be necessary only if we don't have consensus on a > proposal. If anyone

Re: [DISCUSS] What do we plan for Iceberg 2.0.0 ?

2024-03-13 Thread Jean-Baptiste Onofré
Hi Fokko Thanks for your reply ! I agree with your points. 1. About "housing" of Iceberg components, I fully agree. It's always better to have it on involved projects (from a maintenance standpoint and governance). Sometime it's not easy (look about Iceberg engines, or Apache Camel and all compo

Re: [DISCUSS] What do we plan for Iceberg 2.0.0 ?

2024-03-13 Thread Fokko Driesprong
Hey JB, Thanks for raising this. Sorry for the late reply, but I was OOO last week. I think in general the progress is being kept on the spec itself . Also, some features are already available (default values in Python, and nanosecond timestamps

Re: [DISCUSS] Iceberg board report - March 2024

2024-03-13 Thread Jean-Baptiste Onofré
Hi I agree with the naming comments from Fokko. About releases, we should only include releases since the last report (quarter). However, if needed, we can include older releases with a note like "forget in last report". The community health is great, thanks for that. Thanks Ryan for the report