[I] Delete Files in Table Scans [iceberg-rust]

2024-09-12 Thread via GitHub
sdd opened a new issue, #630: URL: https://github.com/apache/iceberg-rust/issues/630 I'm looking to start work on proper handling of delete files in table scans and so I'd like to open an issue to discuss some of the design decisions. A core tenet of our approach so far has been to en

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-09-12 Thread via GitHub
flyrain commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1758243809 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3103,81 @@ components: uuid: type: string +ADLSCredentials: + type: object +

Re: [PR] OpenAPI: Standardize credentials in loadTable/loadView responses [iceberg]

2024-09-12 Thread via GitHub
flyrain commented on code in PR #10722: URL: https://github.com/apache/iceberg/pull/10722#discussion_r1758243809 ## open-api/rest-catalog-open-api.yaml: ## @@ -3103,6 +3103,81 @@ components: uuid: type: string +ADLSCredentials: + type: object +

Re: [PR] API, Core: Add manifestLocation API to ContentFile [iceberg]

2024-09-12 Thread via GitHub
amogh-jahagirdar commented on PR #11044: URL: https://github.com/apache/iceberg/pull/11044#issuecomment-2348024768 I'm going to go ahead and merge. Thanks for reviewing @aokolnychyi @rdblue! -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Core: Allow servers to express supported endpoints via endpoint field in ConfigResponse [iceberg]

2024-09-12 Thread via GitHub
nastra commented on code in PR #10929: URL: https://github.com/apache/iceberg/pull/10929#discussion_r1758210179 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -394,6 +445,10 @@ private LoadTableResponse loadInternal( @Override public Table l

[PR] [CI] Run on different platforms (ubuntu/windows/mac/mac m1) [iceberg-python]

2024-09-12 Thread via GitHub
kevinjqliu opened a new pull request, #1173: URL: https://github.com/apache/iceberg-python/pull/1173 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Core: Allow servers to express supported endpoints via endpoint field in ConfigResponse [iceberg]

2024-09-12 Thread via GitHub
nastra commented on code in PR #10929: URL: https://github.com/apache/iceberg/pull/10929#discussion_r1758204027 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -393,6 +438,15 @@ private LoadTableResponse loadInternal( @Override public Table l

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
pvary commented on PR #10935: URL: https://github.com/apache/iceberg/pull/10935#issuecomment-2348057437 > @pvary I added some tests in `TestBaseIncrementalChangelogScan`. However, at that level, we can only check what scan tasks (`AddedRowsScanTask`, `DeletedRowsScanTask`, `DeletedDataFileS

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
pvary commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1758196928 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -63,33 +60,43 @@ protected CloseableIterable doPlanFiles( return CloseableItera

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on PR #10935: URL: https://github.com/apache/iceberg/pull/10935#issuecomment-2348055779 @pvary I added some tests in `TestBaseIncrementalChangelogScan`. However, at that level, we can only check what scan tasks (`AddedRowsScanTask`, `DeletedRowsScanTask`, `DeletedDataFileSc

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
pvary commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1758196928 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -63,33 +60,43 @@ protected CloseableIterable doPlanFiles( return CloseableItera

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
pvary commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1758193054 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -133,51 +131,149 @@ private static Map computeSnapshotOrdinals(Deque snapsh retu

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1758191734 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -63,33 +60,43 @@ protected CloseableIterable doPlanFiles( return CloseableIter

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
pvary commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1758186873 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -133,51 +131,149 @@ private static Map computeSnapshotOrdinals(Deque snapsh retu

Re: [PR] Flink: Increase the number of checkpoints from 4 to 6 to fix flakiness. [iceberg]

2024-09-12 Thread via GitHub
stevenzwu commented on PR #11121: URL: https://github.com/apache/iceberg/pull/11121#issuecomment-2348032734 > @stevenzwu Can you rerun CI multiple times to verify (by closing and reopening PR)? sure. I can re-run the CI checks without close/reopen PR. will run it a few times -- Th

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
pvary commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1758181104 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -63,33 +60,43 @@ protected CloseableIterable doPlanFiles( return CloseableItera

Re: [PR] API, Core: Add manifestLocation API to ContentFile [iceberg]

2024-09-12 Thread via GitHub
amogh-jahagirdar merged PR #11044: URL: https://github.com/apache/iceberg/pull/11044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [I] [core] GSS initiate failed [iceberg]

2024-09-12 Thread via GitHub
gabrywu closed issue #8342: [core] GSS initiate failed URL: https://github.com/apache/iceberg/issues/8342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] Flink: Increase the number of checkpoints from 4 to 6 to fix flakiness. [iceberg]

2024-09-12 Thread via GitHub
manuzhang commented on PR #11121: URL: https://github.com/apache/iceberg/pull/11121#issuecomment-2347989903 @stevenzwu Can you rerun CI multiple times to verify (by closing and reopening PR)? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] fix: reorder record batch [iceberg-rust]

2024-09-12 Thread via GitHub
chenzl25 commented on PR #629: URL: https://github.com/apache/iceberg-rust/pull/629#issuecomment-2347977464 > I've already addressed this re-ordering in my open PR, #602 Great news!Looking forward to your PR being merged. -- This is an automated message from the Apache Git Service.

Re: [PR] AWS: Introduce opt-in S3LocationProvider which is optimized for S3 performance [iceberg]

2024-09-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #2: URL: https://github.com/apache/iceberg/pull/2#discussion_r1757746279 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3LocationProvider.java: ## @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

Re: [I] Inconsistent row count across versions [iceberg-python]

2024-09-12 Thread via GitHub
sungwy commented on issue #1132: URL: https://github.com/apache/iceberg-python/issues/1132#issuecomment-2347946846 Thanks for confirming @daturkel . After a lot of experiments, I've finally been able to create a minimum reproducible test on this PR: https://github.com/apache/iceberg-python

Re: [I] support project pushdown for datafusion iceberg [iceberg-rust]

2024-09-12 Thread via GitHub
liurenjie1024 closed issue #592: support project pushdown for datafusion iceberg URL: https://github.com/apache/iceberg-rust/issues/592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: support projection pushdown for datafusion iceberg [iceberg-rust]

2024-09-12 Thread via GitHub
liurenjie1024 merged PR #594: URL: https://github.com/apache/iceberg-rust/pull/594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ic

Re: [PR] Spark 3.5: Don't change table distribution when only altering local order [iceberg]

2024-09-12 Thread via GitHub
manuzhang commented on code in PR #10774: URL: https://github.com/apache/iceberg/pull/10774#discussion_r1758064616 ## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala: ## @@ -226,11 +226,13 @@ class I

Re: [I] [BUG] `Catalog.list_tables()` inconsistency between docstring and signature [iceberg-python]

2024-09-12 Thread via GitHub
sungwy closed issue #1163: [BUG] `Catalog.list_tables()` inconsistency between docstring and signature URL: https://github.com/apache/iceberg-python/issues/1163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] follow up for more cleanup [iceberg-python]

2024-09-12 Thread via GitHub
sungwy merged PR #1168: URL: https://github.com/apache/iceberg-python/pull/1168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] follow up for more cleanup [iceberg-python]

2024-09-12 Thread via GitHub
sungwy commented on PR #1168: URL: https://github.com/apache/iceberg-python/pull/1168#issuecomment-2347892992 Thanks for taking care of this (again) @dataders 🙂 And thank you @kevinjqliu for the thorough review! -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Spec: Support geo type [iceberg]

2024-09-12 Thread via GitHub
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1758047171 ## format/spec.md: ## @@ -198,6 +199,9 @@ Notes: - Timestamp values _with time zone_ represent a point in time: values are stored as UTC and do not retain a so

Re: [PR] Spec: Support geo type [iceberg]

2024-09-12 Thread via GitHub
jiayuasu commented on code in PR #10981: URL: https://github.com/apache/iceberg/pull/10981#discussion_r1758047171 ## format/spec.md: ## @@ -198,6 +199,9 @@ Notes: - Timestamp values _with time zone_ represent a point in time: values are stored as UTC and do not retain a so

Re: [PR] API, Core: Add manifestLocation API to ContentFile which will return the path to a manifest from which the content file resides in [iceberg]

2024-09-12 Thread via GitHub
amogh-jahagirdar commented on PR #11044: URL: https://github.com/apache/iceberg/pull/11044#issuecomment-2347671747 This time `TestDataFrameWrites > testFaultToleranceOnWrite()` failed which is known to be flaky: https://github.com/apache/iceberg/pull/10811 -- This is an automated message

Re: [I] [core] GSS initiate failed [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8342: URL: https://github.com/apache/iceberg/issues/8342#issuecomment-2347447495 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Iceberg data file Not Found but have an entry in table.files catalog [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8338: URL: https://github.com/apache/iceberg/issues/8338#issuecomment-2347446983 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Add the document for Spark properties [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8314: URL: https://github.com/apache/iceberg/issues/8314#issuecomment-2347445682 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Core: Support changing compression codec for ManifestWriter [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8284: URL: https://github.com/apache/iceberg/pull/8284#issuecomment-2347444624 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] Creating an existing database with spark sql command "Create database if exists" throws exception [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8298: URL: https://github.com/apache/iceberg/issues/8298#issuecomment-2347445053 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Supporting `double` type for `truncate` partitioning [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8275: URL: https://github.com/apache/iceberg/issues/8275#issuecomment-2347444181 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Provide `jsonschema` for the Metadata [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8266: URL: https://github.com/apache/iceberg/issues/8266#issuecomment-2347444001 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Docs: document the detail impact of the configuration 'compatibility.snapshot-id-inheritance.enabled' [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8375: URL: https://github.com/apache/iceberg/issues/8375#issuecomment-2347448015 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] API: Build accessor from struct directly [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8367: URL: https://github.com/apache/iceberg/pull/8367#issuecomment-2347447869 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] If you don't restore from checkpoint, how does flink consume from the last snapshot increment? Like kafka. [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8348: URL: https://github.com/apache/iceberg/issues/8348#issuecomment-2347447710 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Export to Long Term Storage and Re Loading [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8339: URL: https://github.com/apache/iceberg/issues/8339#issuecomment-2347447262 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Spark 3.4, Docs: Add RemoveOrphanFiles time-interval specification and testing option to the exception message [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8324: URL: https://github.com/apache/iceberg/pull/8324#issuecomment-2347446743 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] spark write orc error: Java heap space [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8318: URL: https://github.com/apache/iceberg/issues/8318#issuecomment-2347446295 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Core, Hive, Nessie: Use ResolvingFileIO as default instead of HadoopFileIO [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8272: URL: https://github.com/apache/iceberg/pull/8272#issuecomment-2347444050 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] Support rebase one branch onto other branch [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8268: URL: https://github.com/apache/iceberg/issues/8268#issuecomment-2347444017 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] DataTableScan may not include Unpartitioned data in the results [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8269: URL: https://github.com/apache/iceberg/issues/8269#issuecomment-2347444034 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Default table properties not respected when using Spark DataFrame API [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8265: URL: https://github.com/apache/iceberg/issues/8265#issuecomment-2347443986 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] relativePath [wip] [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8260: URL: https://github.com/apache/iceberg/pull/8260#issuecomment-2347443962 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] multi-arg transform support [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8258: URL: https://github.com/apache/iceberg/issues/8258#issuecomment-2347443909 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] MERGE INTO number of affected rows [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8229: URL: https://github.com/apache/iceberg/issues/8229#issuecomment-2347443855 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] RollingFileWriter Throws Exceptions if it Does Not Have Delete Permissions [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8253: URL: https://github.com/apache/iceberg/issues/8253#issuecomment-2347443879 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] AWS: Add retry logic for S3InputStream and S3OutputStream [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8221: URL: https://github.com/apache/iceberg/pull/8221#issuecomment-2347443821 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Gradle: configure tasks on demand [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #7956: URL: https://github.com/apache/iceberg/pull/7956#issuecomment-2347443545 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] core: initial support of multi-arg bucket [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #8259: URL: https://github.com/apache/iceberg/pull/8259#issuecomment-2347443939 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Spark 3.2 and 3.3: Use Reblance instead of Repartition for distribution in SparkWrite [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on PR #7932: URL: https://github.com/apache/iceberg/pull/7932#issuecomment-2347443476 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [I] upgrade iceberg from 0.14.1 to 1.2.1, flink task error with InvalidClassException: org.apache.iceberg.BaseFileScanTask; local class incompatible: stream classdesc serialVersionUID = -410451952

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8220: URL: https://github.com/apache/iceberg/issues/8220#issuecomment-2347443799 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Huge amount of Aws s3 Exception "Unable to execute HTTP request: The target server failed to respond" during Iceberg v2 table merge with some DeleteFiles + DataFiles in a partition [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] commented on issue #8218: URL: https://github.com/apache/iceberg/issues/8218#issuecomment-2347443778 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Gradle: configure tasks on demand [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] closed pull request #7956: Gradle: configure tasks on demand URL: https://github.com/apache/iceberg/pull/7956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Spark 3.2 and 3.3: Use Reblance instead of Repartition for distribution in SparkWrite [iceberg]

2024-09-12 Thread via GitHub
github-actions[bot] closed pull request #7932: Spark 3.2 and 3.3: Use Reblance instead of Repartition for distribution in SparkWrite URL: https://github.com/apache/iceberg/pull/7932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Preserve Backward compatibility in 0.8.0 for #1144 [iceberg-python]

2024-09-12 Thread via GitHub
sungwy merged PR #1151: URL: https://github.com/apache/iceberg-python/pull/1151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [I] [feat] add missing metadata tables [iceberg-python]

2024-09-12 Thread via GitHub
kevinjqliu commented on issue #1053: URL: https://github.com/apache/iceberg-python/issues/1053#issuecomment-2347434164 What is the difference between your implementation's output vs sparks? From the [spark docs](https://iceberg.apache.org/docs/latest/spark-queries/#files), "To show

Re: [PR] API, Core: Add manifestLocation API to ContentFile which will return the path to a manifest from which the content file resides in [iceberg]

2024-09-12 Thread via GitHub
amogh-jahagirdar commented on code in PR #11044: URL: https://github.com/apache/iceberg/pull/11044#discussion_r1757729519 ## core/src/test/java/org/apache/iceberg/DataTableScanTestBase.java: ## @@ -180,12 +184,25 @@ public void testSettingInvalidRefFails() { private void va

Re: [PR] Add metadata tables for `data_files` and `delete_files` [iceberg-python]

2024-09-12 Thread via GitHub
kevinjqliu commented on PR #1066: URL: https://github.com/apache/iceberg-python/pull/1066#issuecomment-2347426984 thanks again @soumya-ghosh. I'll wait for another approval before merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [I] PyIceberg - MetaException(message='java.lang.IllegalArgumentException: bucket is null/empty') [iceberg-python]

2024-09-12 Thread via GitHub
kevinjqliu commented on issue #1165: URL: https://github.com/apache/iceberg-python/issues/1165#issuecomment-2347426186 > My intention is creating Iceberg Tables in OCI Object Storage. Is there any documentation I can check to achieve this? I don't know any OCI related documentation.

[I] Improve Position Deletes in V3 [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi opened a new issue, #11122: URL: https://github.com/apache/iceberg/issues/11122 ### Proposed Change This proposal aims to enhance the handling of position deletes in Iceberg. It builds on lessons learned from deploying the current approach at scale and addresses all unres

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on PR #11086: URL: https://github.com/apache/iceberg/pull/11086#issuecomment-2347417367 I've addressed the feedback. Thanks for reviewing, @stevenzwu @amogh-jahagirdar @karuppayya @dramaticlly @nastra! -- This is an automated message from the Apache Git Service. To r

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi merged PR #11086: URL: https://github.com/apache/iceberg/pull/11086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] API, Core: Add manifestLocation API to ContentFile which will return the path to a manifest from which the content file resides in [iceberg]

2024-09-12 Thread via GitHub
amogh-jahagirdar commented on PR #11044: URL: https://github.com/apache/iceberg/pull/11044#issuecomment-2347414798 Verified TestSparkReaderDeletes#testMultiplePosDeleteFiles() failure is unrelated. The test may be flaky. I'll push another change to address the nit and we can verify if it's

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757711124 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -133,51 +131,149 @@ private static Map computeSnapshotOrdinals(Deque snapsh ret

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757711124 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -133,51 +131,149 @@ private static Map computeSnapshotOrdinals(Deque snapsh ret

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757707990 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -133,51 +131,149 @@ private static Map computeSnapshotOrdinals(Deque snapsh ret

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757706938 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -133,51 +131,149 @@ private static Map computeSnapshotOrdinals(Deque snapsh ret

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757701740 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java: ## @@ -112,13 +149,62 @@ private CloseableIterable openChangelogScanTask(Ch

Re: [PR] Core: Add support for `view-default` property in catalog [iceberg]

2024-09-12 Thread via GitHub
ebyhr commented on code in PR #11064: URL: https://github.com/apache/iceberg/pull/11064#discussion_r1757698066 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -85,7 +85,8 @@ static RESTCatalog initCatalogClient() { catalogProperties.putIfAbsen

Re: [PR] Flink: Custom partitioner for bucket partitions [iceberg]

2024-09-12 Thread via GitHub
binshuohu commented on PR #7161: URL: https://github.com/apache/iceberg/pull/7161#issuecomment-2347381724 @stevenzwu Is there any plan to reapply this change to the main branch? Has there been any follow up since https://github.com/apache/iceberg/pull/8848 ? -- This is an automated messag

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757688389 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java: ## @@ -112,13 +149,62 @@ private CloseableIterable openChangelogScanTask(Ch

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757688389 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java: ## @@ -112,13 +149,62 @@ private CloseableIterable openChangelogScanTask(Ch

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757688389 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/ChangelogRowReader.java: ## @@ -112,13 +149,62 @@ private CloseableIterable openChangelogScanTask(Ch

[PR] Bump mypy-boto3-glue from 1.35.3 to 1.35.18 [iceberg-python]

2024-09-12 Thread via GitHub
dependabot[bot] opened a new pull request, #1171: URL: https://github.com/apache/iceberg-python/pull/1171 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.35.3 to 1.35.18. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/commit

[PR] Bump griffe from 1.3.0 to 1.3.1 [iceberg-python]

2024-09-12 Thread via GitHub
dependabot[bot] opened a new pull request, #1170: URL: https://github.com/apache/iceberg-python/pull/1170 Bumps [griffe](https://github.com/mkdocstrings/griffe) from 1.3.0 to 1.3.1. Release notes Sourced from https://github.com/mkdocstrings/griffe/releases";>griffe's releases.

Re: [PR] Support changelog scan for table with delete files [iceberg]

2024-09-12 Thread via GitHub
wypoon commented on code in PR #10935: URL: https://github.com/apache/iceberg/pull/10935#discussion_r1757679042 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/source/TestChangelogReader.java: ## @@ -191,39 +214,359 @@ public void testMixDeleteAndInsert() throws IOExc

Re: [PR] API, Core: Add manifestLocation API to ContentFile which will return the path to a manifest from which the content file resides in [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11044: URL: https://github.com/apache/iceberg/pull/11044#discussion_r1757675900 ## core/src/test/java/org/apache/iceberg/DataTableScanTestBase.java: ## @@ -180,12 +184,25 @@ public void testSettingInvalidRefFails() { private void validat

[PR] Flink: Increase the number of checkpoints from 4 to 6 to fix flakiness. [iceberg]

2024-09-12 Thread via GitHub
stevenzwu opened a new pull request, #11121: URL: https://github.com/apache/iceberg/pull/11121 6 checkpoionts cycles seem to be more stable based on the existing TestFlinkIcebergSinkDistributionMode test. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Flink: Increase the number of checkpoints from 4 to 6 to fix flakiness. [iceberg]

2024-09-12 Thread via GitHub
stevenzwu commented on PR #11121: URL: https://github.com/apache/iceberg/pull/11121#issuecomment-2347342496 cc @manuzhang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11086: URL: https://github.com/apache/iceberg/pull/11086#discussion_r1757656744 ## core/src/test/java/org/apache/iceberg/TestSnapshotProducer.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11086: URL: https://github.com/apache/iceberg/pull/11086#discussion_r1757656534 ## core/src/test/java/org/apache/iceberg/TestSnapshotProducer.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11086: URL: https://github.com/apache/iceberg/pull/11086#discussion_r1757656126 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -554,6 +562,84 @@ protected boolean cleanupAfterCommit() { return true; } + prote

Re: [PR] Core: Allow servers to express supported endpoints via endpoint field in ConfigResponse [iceberg]

2024-09-12 Thread via GitHub
rdblue commented on code in PR #10929: URL: https://github.com/apache/iceberg/pull/10929#discussion_r1757654013 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -393,6 +438,15 @@ private LoadTableResponse loadInternal( @Override public Table l

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11086: URL: https://github.com/apache/iceberg/pull/11086#discussion_r1757647083 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -554,6 +562,84 @@ protected boolean cleanupAfterCommit() { return true; } + prote

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11086: URL: https://github.com/apache/iceberg/pull/11086#discussion_r1757644822 ## core/src/test/java/org/apache/iceberg/TestSnapshotProducer.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] Core: Parallelize manifest writing for many new files [iceberg]

2024-09-12 Thread via GitHub
aokolnychyi commented on code in PR #11086: URL: https://github.com/apache/iceberg/pull/11086#discussion_r1757641413 ## core/src/main/java/org/apache/iceberg/SnapshotProducer.java: ## @@ -554,6 +562,84 @@ protected boolean cleanupAfterCommit() { return true; } + prote

Re: [PR] OpenAPI: Fix YAML example and value json formatting [iceberg]

2024-09-12 Thread via GitHub
rdblue merged PR #9: URL: https://github.com/apache/iceberg/pull/9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] fix: reorder record batch [iceberg-rust]

2024-09-12 Thread via GitHub
sdd commented on PR #629: URL: https://github.com/apache/iceberg-rust/pull/629#issuecomment-2347313169 I've already addressed this re-ordering in my open PR, https://github.com/apache/iceberg-rust/pull/602 -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] PyIceberg - MetaException(message='java.lang.IllegalArgumentException: bucket is null/empty') [iceberg-python]

2024-09-12 Thread via GitHub
malopezh commented on issue #1165: URL: https://github.com/apache/iceberg-python/issues/1165#issuecomment-2347299426 > > uri="thrift://localhost:9083", > > Is this a HMS? I think the error is from the HMS setup Yes it's HMS. I configured Hadoop, Hive and HiveMetaStore service a

[I] support equality/positional deletes in vectorized arrow reader [iceberg]

2024-09-12 Thread via GitHub
callum-ryan opened a new issue, #11120: URL: https://github.com/apache/iceberg/issues/11120 ### Feature Request / Improvement when using `VectorizedTableScanIterable` / `ArrowReader` there is no support for deletes, be it equality or positional. The `IcebergGenerics` functionality to

Re: [PR] Spark 3.4: Action to compute table stats [iceberg]

2024-09-12 Thread via GitHub
karuppayya closed pull request #11106: Spark 3.4: Action to compute table stats URL: https://github.com/apache/iceberg/pull/11106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] follow up for more cleanup [iceberg-python]

2024-09-12 Thread via GitHub
dataders opened a new pull request, #1168: URL: https://github.com/apache/iceberg-python/pull/1168 resolves: #1163 two more docstrings that are no longer relevant -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-09-12 Thread via GitHub
danielcweeks commented on PR #9852: URL: https://github.com/apache/iceberg/pull/9852#issuecomment-2347189790 @nk1506 A few comments, but other than that this LGTM @nastra did you also want to take another pass? -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Add metadata tables for `data_files` and `delete_files` [iceberg-python]

2024-09-12 Thread via GitHub
sungwy commented on code in PR #1066: URL: https://github.com/apache/iceberg-python/pull/1066#discussion_r1757539710 ## tests/integration/test_inspect_table.py: ## @@ -672,126 +672,141 @@ def test_inspect_files( # append more data tbl.append(arrow_table_with_null) -

  1   2   >