Re: [DISCUSS] Default format version for new tables?

2023-05-24 Thread Szehon Ho
Hi, I'm +1 to making v2 the default, say after this release. It seems most of the features brought up as concerns on Spark side in the thread Gabor linked have been implemented (like position delete lifecycle). But Anton's point is also good. Even if some delete file features are missing, V2 is

Re: rewrite action for collate how can we pass date range?

2023-05-24 Thread Gaurav Agarwal
Thank you Yew On Wed, May 24, 2023, 11:19 PM Wing Yew Poon wrote: > Gaurav, > > Is your data partitioned by date? If so, you can compact subsets of > partitions at a time. To do this using the Spark procedure, you pass a > where clause: > > spark.sql("CALL catalog_name.system.rewrite_data_files(

Re: [VOTE] Release Apache Iceberg 1.3.0 RC0

2023-05-24 Thread Szehon Ho
+1 (binding) 1. verify signatures 2. verify checksum 3. verify license documentation 4. build and run tests 5. Ran simple tests on Spark 3.4 - Create simple table and check metadata tables - Ran 'delete from' statement to generate position delete, and run rewrite_position_delete Thanks Szehon On

Re: rewrite action for collate how can we pass date range?

2023-05-24 Thread Wing Yew Poon
Gaurav, Is your data partitioned by date? If so, you can compact subsets of partitions at a time. To do this using the Spark procedure, you pass a where clause: spark.sql("CALL catalog_name.system.rewrite_data_files(table => '...', where => '...')") If you use the RewriteDataFilesSparkAction, yo

Re: Copyonwrite scan

2023-05-24 Thread russell . spitzer
Could you include the exception you are seeing? Sent from my iPhone > On May 23, 2023, at 9:13 PM, Gaurav Agarwal wrote: > >  > Hi > > We are getting > " runtime file filtering exception the table has been concurrently modified > row level operation scan snapshot id " > > This exception we

Re: Orphan files

2023-05-24 Thread Fokko Driesprong
Hey Gaurav, Orphan files do not affect Iceberg's performance, since Iceberg performs no list operations. It will only increase your storage bill since you have files around that are not relevant anymore. iceberg tables do need periodic maintenance, for example, it is good to rewrite small files

Re: [DISCUSS] Default format version for new tables?

2023-05-24 Thread Gabor Kaszab
Hey Anton, Just adding a note that back around January the same topic was brought up on this mail list. There the conclusion was to use the 'table-default.' catalog level property to create V2 tables by default. https://lists.apache.org/thread/9ct0p817qxqqdnv7nb35kghsfygjkqdf I'm not saying that