Re: [PR] View Spec implementation [iceberg-rust]

2024-07-02 Thread via GitHub
nastra commented on PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#issuecomment-2205201047 > When creating views, spark creates Metadata objects that specify the "default-namespace" field but as an empty vec Is this with plain OSS Spark or are you using the iceberg-spar

Re: [I] Add Multi-Table Transaction API [iceberg]

2024-07-02 Thread via GitHub
nastra commented on issue #10617: URL: https://github.com/apache/iceberg/issues/10617#issuecomment-2205193743 > So just to be clear, this will be only for the REST catalog right? Do we consider this feature also for other catalogs? Because I see you write that in the Google doc "Implementin

Re: [PR] Address Intellij inspection findings [iceberg]

2024-07-02 Thread via GitHub
ajantha-bhat commented on PR #10583: URL: https://github.com/apache/iceberg/pull/10583#issuecomment-2205181608 > The PR is already quite large and given that we need to fix those things across all Flink/Spark versions I'd suggest to extract the Flink/Spark related things into a separate PR

Re: [I] Support writing to a table with sort-order [iceberg-python]

2024-07-02 Thread via GitHub
Fokko commented on issue #271: URL: https://github.com/apache/iceberg-python/issues/271#issuecomment-2205116344 @vinjai Since we ignore the write-order today, I think proceeding is fine. Maybe raise a warning so the user knows the data isn't being sorted. Sorting in Python would be __very__

Re: [I] idea: Refactor the README to be more user-oriented [iceberg-rust]

2024-07-02 Thread via GitHub
liurenjie1024 commented on issue #429: URL: https://github.com/apache/iceberg-rust/issues/429#issuecomment-2205090406 +1 for this idea, with 0.3 releaseit's supposed to be more user friendly. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] idea: Refactor the README to be more user-oriented [iceberg-rust]

2024-07-02 Thread via GitHub
Xuanwo commented on issue #429: URL: https://github.com/apache/iceberg-rust/issues/429#issuecomment-2205072720 I'm happy to handle this issue. I enjoy writing in `markdown` and coding in `rust`. 😄 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Kafka Connect: Commit coordination [iceberg]

2024-07-02 Thread via GitHub
fqaiser94 commented on code in PR #10351: URL: https://github.com/apache/iceberg/pull/10351#discussion_r1663457632 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/IcebergSinkTask.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Kafka Connect: Commit coordination [iceberg]

2024-07-02 Thread via GitHub
fqaiser94 commented on code in PR #10351: URL: https://github.com/apache/iceberg/pull/10351#discussion_r1663456016 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/IcebergSinkTask.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundatio

Re: [PR] Kafka Connect: Commit coordination [iceberg]

2024-07-02 Thread via GitHub
fqaiser94 commented on code in PR #10351: URL: https://github.com/apache/iceberg/pull/10351#discussion_r1663394084 ## kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/channel/Worker.java: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] Bump getdaft from 0.2.28 to 0.2.29 [iceberg-python]

2024-07-02 Thread via GitHub
Fokko merged PR #882: URL: https://github.com/apache/iceberg-python/pull/882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] reuse docker container to save compute resources [iceberg-rust]

2024-07-02 Thread via GitHub
Xuanwo commented on code in PR #428: URL: https://github.com/apache/iceberg-rust/pull/428#discussion_r1663450908 ## crates/catalog/glue/tests/glue_catalog_test.rs: ## @@ -92,6 +96,24 @@ async fn set_test_fixture(func: &str) -> TestFixture { } } +async fn lazy_reuse_dc()

Re: [PR] reuse docker container to save compute resources [iceberg-rust]

2024-07-02 Thread via GitHub
liurenjie1024 commented on code in PR #428: URL: https://github.com/apache/iceberg-rust/pull/428#discussion_r1663378624 ## crates/catalog/glue/tests/glue_catalog_test.rs: ## @@ -92,6 +96,24 @@ async fn set_test_fixture(func: &str) -> TestFixture { } } +async fn lazy_reus

[PR] Forward Compatible large_* type support: read as large, write as small [iceberg-python]

2024-07-02 Thread via GitHub
syun64 opened a new pull request, #890: URL: https://github.com/apache/iceberg-python/pull/890 Solves: https://github.com/apache/iceberg-python/issues/887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663371244 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Foun

Re: [I] discussion: Refactor our integration tests to make it more scalable. [iceberg-rust]

2024-07-02 Thread via GitHub
liurenjie1024 commented on issue #425: URL: https://github.com/apache/iceberg-rust/issues/425#issuecomment-2204873827 Cool, let move! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663369835 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table table

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663366317 ## spark/v3.5/build.gradle: ## @@ -59,6 +59,7 @@ project(":iceberg-spark:iceberg-spark-${sparkMajorVersion}_${scalaVersion}") { implementation project(':ice

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
ajantha-bhat commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663366952 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table tabl

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663365351 ## api/src/main/java/org/apache/iceberg/actions/AnalyzeTable.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663363679 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table table

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663365052 ## api/src/main/java/org/apache/iceberg/actions/AnalyzeTable.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or m

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663363580 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table table

Re: [PR] Add pool_pre_ping param to SQLCatalog and fix echo parsing logic [iceberg-python]

2024-07-02 Thread via GitHub
cccs-eric commented on code in PR #886: URL: https://github.com/apache/iceberg-python/pull/886#discussion_r1663354850 ## pyiceberg/catalog/sql.py: ## @@ -110,8 +110,28 @@ def __init__(self, name: str, **properties: str): if not (uri_prop := self.properties.get("uri"))

Re: [PR] Cast 's', 'ms' and 'ns' PyArrow timestamp to 'us' precision on write [iceberg-python]

2024-07-02 Thread via GitHub
corleyma commented on code in PR #848: URL: https://github.com/apache/iceberg-python/pull/848#discussion_r1663349324 ## pyiceberg/io/pyarrow.py: ## @@ -918,11 +919,24 @@ def primitive(self, primitive: pa.DataType) -> PrimitiveType: return TimeType() elif p

Re: [I] The job of writing iceberg v2 table threw a validException when testing the merge of iceberg v2 table [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] closed issue #2773: The job of writing iceberg v2 table threw a validException when testing the merge of iceberg v2 table URL: https://github.com/apache/iceberg/issues/2773 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] The job of writing iceberg v2 table threw a validException when testing the merge of iceberg v2 table [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] commented on issue #2773: URL: https://github.com/apache/iceberg/issues/2773#issuecomment-2204754120 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Deduplication support in RewriteDataFilesAction [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] commented on issue #2764: URL: https://github.com/apache/iceberg/issues/2764#issuecomment-2204754020 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Does MERGE INTO operations support hidden partition on timestamp columns? [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] closed issue #2765: Does MERGE INTO operations support hidden partition on timestamp columns? URL: https://github.com/apache/iceberg/issues/2765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Does MERGE INTO operations support hidden partition on timestamp columns? [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] commented on issue #2765: URL: https://github.com/apache/iceberg/issues/2765#issuecomment-2204754075 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Deduplication support in RewriteDataFilesAction [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] closed issue #2764: Deduplication support in RewriteDataFilesAction URL: https://github.com/apache/iceberg/issues/2764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Support TIME_MILLIS in Arrow code [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] commented on issue #2755: URL: https://github.com/apache/iceberg/issues/2755#issuecomment-2204753947 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Support TIME_MILLIS in Arrow code [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] closed issue #2755: Support TIME_MILLIS in Arrow code URL: https://github.com/apache/iceberg/issues/2755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Token expiration issued when using WORKER_POOL with Spark Thrift Server [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] closed issue #2753: Token expiration issued when using WORKER_POOL with Spark Thrift Server URL: https://github.com/apache/iceberg/issues/2753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Token expiration issued when using WORKER_POOL with Spark Thrift Server [iceberg]

2024-07-02 Thread via GitHub
github-actions[bot] commented on issue #2753: URL: https://github.com/apache/iceberg/issues/2753#issuecomment-2204753860 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
aokolnychyi commented on PR #10288: URL: https://github.com/apache/iceberg/pull/10288#issuecomment-2204696322 I'll have some time to take a look this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

2024-07-02 Thread via GitHub
dramaticlly commented on code in PR #10547: URL: https://github.com/apache/iceberg/pull/10547#discussion_r1663244200 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -375,30 +374,33 @@ private Schema schemaWithMetadataColumns() {

[PR] Bump pypa/cibuildwheel from 2.19.1 to 2.19.2 [iceberg-python]

2024-07-02 Thread via GitHub
dependabot[bot] opened a new pull request, #889: URL: https://github.com/apache/iceberg-python/pull/889 Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.19.1 to 2.19.2. Release notes Sourced from https://github.com/pypa/cibuildwheel/releases";>pypa/cibuildwhee

[PR] Bump mkdocs-material from 9.5.27 to 9.5.28 [iceberg-python]

2024-07-02 Thread via GitHub
dependabot[bot] opened a new pull request, #888: URL: https://github.com/apache/iceberg-python/pull/888 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.27 to 9.5.28. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mk

Re: [PR] View Spec implementation [iceberg-rust]

2024-07-02 Thread via GitHub
c-thiel commented on PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#issuecomment-2204539127 @nastra, @Fokko during testing we found a Problem with the "default-namespace". I am hoping for some insights from your side: According to the iceberg view spec, "default-namespace"

Re: [I] Backward incompatible types introduced when writing Iceberg data [iceberg-python]

2024-07-02 Thread via GitHub
kevinjqliu commented on issue #887: URL: https://github.com/apache/iceberg-python/issues/887#issuecomment-2204487503 So the current version of pyiceberg can write parquet files with the `large_string` data type. But the older version of pyiceberg cannot read the parquet file with the `large

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
szehon-ho commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663188357 ## api/src/main/java/org/apache/iceberg/actions/AnalyzeTable.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
szehon-ho commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663188357 ## api/src/main/java/org/apache/iceberg/actions/AnalyzeTable.java: ## @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

2024-07-02 Thread via GitHub
szehon-ho commented on code in PR #10547: URL: https://github.com/apache/iceberg/pull/10547#discussion_r1663164870 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -375,30 +374,33 @@ private Schema schemaWithMetadataColumns() {

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
szehon-ho commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1663127233 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache Software Founda

[PR] Initial Support for Spark 4.0 [iceberg]

2024-07-02 Thread via GitHub
huaxingao opened a new pull request, #10622: URL: https://github.com/apache/iceberg/pull/10622 This PR has the initial support for Spark 4.0. I use `v4.0.0-preview1` for now. Will switch to `v4.0.0` -- This is an automated message from the Apache Git Service. To respond to the message, pl

[PR] REST: refactor OAuth logic into AuthManager Interface [iceberg]

2024-07-02 Thread via GitHub
jackye1995 opened a new pull request, #10621: URL: https://github.com/apache/iceberg/pull/10621 A part of issue #10537, next step of #10603 This is a very rough draft to see how it could look like when the OAuth2 logic is abstracted away from the RESTSessionCatalog. There could be mor

Re: [PR] Create rollback and set snapshot APIs [iceberg-python]

2024-07-02 Thread via GitHub
chinmay-bhat commented on code in PR #758: URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1662967631 ## pyiceberg/table/__init__.py: ## @@ -2010,6 +2016,84 @@ def create_branch( self._requirements += requirement return self +def rollba

[I] Backward incompatible types introduced when writing Iceberg data [iceberg-python]

2024-07-02 Thread via GitHub
syun64 opened a new issue, #887: URL: https://github.com/apache/iceberg-python/issues/887 ### Apache Iceberg version None ### Please describe the bug 🐞 Through the introduction of https://github.com/apache/iceberg-python/pull/807 we have introduced large_* types in the

Re: [I] as_arrow() fail on struct with ListType required [iceberg-python]

2024-07-02 Thread via GitHub
raphaelauv closed issue #885: as_arrow() fail on struct with ListType required URL: https://github.com/apache/iceberg-python/issues/885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] as_arrow() fail on struct with ListType required [iceberg-python]

2024-07-02 Thread via GitHub
raphaelauv closed issue #885: as_arrow() fail on struct with ListType required URL: https://github.com/apache/iceberg-python/issues/885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Create rollback and set snapshot APIs [iceberg-python]

2024-07-02 Thread via GitHub
chinmay-bhat commented on code in PR #758: URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1663030804 ## pyiceberg/table/__init__.py: ## @@ -1956,6 +1957,10 @@ def _commit(self) -> UpdatesAndRequirements: """Apply the pending changes and commit."""

Re: [PR] Spark Action to Analyze table [iceberg]

2024-07-02 Thread via GitHub
karuppayya commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1662953911 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/NDVSketchGenerator.java: ## @@ -0,0 +1,119 @@ +/* + * Licensed to the Apache Software Foundation

Re: [I] insert to hive table with icberg table format is failing [iceberg]

2024-07-02 Thread via GitHub
g10ck commented on issue #7840: URL: https://github.com/apache/iceberg/issues/7840#issuecomment-2204010251 You can make such hook in hive/beeline. For example, hive 3.1.3: add jar hive_home_directory/lib/libfb303-0.9.3.jar; insert into ... ; It will wotk. But the right solution w

Re: [I] as_arrow() fail on struct with ListType required [iceberg-python]

2024-07-02 Thread via GitHub
raphaelauv commented on issue #885: URL: https://github.com/apache/iceberg-python/issues/885#issuecomment-2204006789 thanks for your help @kevinjqliu , it's an arrow issue https://github.com/apache/arrow/issues/33592 -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Create rollback and set snapshot APIs [iceberg-python]

2024-07-02 Thread via GitHub
chinmay-bhat commented on code in PR #758: URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1662967986 ## pyiceberg/table/__init__.py: ## @@ -2010,6 +2016,84 @@ def create_branch( self._requirements += requirement return self +def rollba

Re: [PR] Create rollback and set snapshot APIs [iceberg-python]

2024-07-02 Thread via GitHub
chinmay-bhat commented on code in PR #758: URL: https://github.com/apache/iceberg-python/pull/758#discussion_r1662967631 ## pyiceberg/table/__init__.py: ## @@ -2010,6 +2016,84 @@ def create_branch( self._requirements += requirement return self +def rollba

Re: [PR] Core: Handles possible heap data corruption of `OAuth2Util.AuthSession#headers` [iceberg]

2024-07-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10615: URL: https://github.com/apache/iceberg/pull/10615#discussion_r1662929875 ## core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java: ## @@ -578,7 +578,8 @@ public Pair refresh(RESTClient client) { .token(r

Re: [PR] Core: Handles possible heap data corruption of `OAuth2Util.AuthSession#headers` [iceberg]

2024-07-02 Thread via GitHub
tlm365 commented on PR #10615: URL: https://github.com/apache/iceberg/pull/10615#issuecomment-2203934789 @adutra @amogh-jahagirdar thanks for reviewing, I've updated it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Deprecate `oauth/tokens` endpoint [iceberg]

2024-07-02 Thread via GitHub
jackye1995 commented on code in PR #10603: URL: https://github.com/apache/iceberg/pull/10603#discussion_r1662916466 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -176,19 +176,34 @@ public void initialize(String name, Map unresolved) { long st

Re: [I] Add Multi-Table Transaction API [iceberg]

2024-07-02 Thread via GitHub
jackye1995 commented on issue #10617: URL: https://github.com/apache/iceberg/issues/10617#issuecomment-2203892955 > I don't recall that we have concluded on just doing a multi-table commit. yeah that's probably just my misunderstanding, since multi-table commit was what was eventually

Re: [PR] Let `./gradlew clean` clean everything [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10601: URL: https://github.com/apache/iceberg/pull/10601#discussion_r1662557426 ## spark/build.gradle: ## @@ -18,16 +18,19 @@ */ // add enabled Spark version modules to the build -def sparkVersions = (System.getProperty("sparkVersions") != nu

Re: [PR] Address Intellij inspection findings [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10583: URL: https://github.com/apache/iceberg/pull/10583#discussion_r1662586692 ## api/src/main/java/org/apache/iceberg/PartitionSpec.java: ## @@ -410,7 +410,7 @@ private void checkAndAddPartitionName(String name, Integer sourceColumnId) {

Re: [PR] Core: Retry connections in JDBC catalog with user configured error code list [iceberg]

2024-07-02 Thread via GitHub
palkx commented on PR #10140: URL: https://github.com/apache/iceberg/pull/10140#issuecomment-2203230368 Thank you for the update! Will wait for the v1.6.0 then. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Support building with Java 21 [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10474: URL: https://github.com/apache/iceberg/pull/10474#discussion_r1661073834 ## baseline.gradle: ## @@ -45,7 +45,10 @@ subprojects { apply plugin: 'com.palantir.baseline-reproducibility' apply plugin: 'com.palantir.baseline-exact-dependen

Re: [I] Question about transform result type [iceberg]

2024-07-02 Thread via GitHub
Fokko commented on issue #10616: URL: https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427 Hey @lurnagao-dahua Thanks for raising this, and @ajantha-bhat for jumping in here. This was discussed earlier in https://github.com/apache/iceberg/issues/10159 and points to a comme

Re: [PR] Support building with Java 21 [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10474: URL: https://github.com/apache/iceberg/pull/10474#discussion_r1661056194 ## baseline.gradle: ## @@ -45,7 +45,10 @@ subprojects { apply plugin: 'com.palantir.baseline-reproducibility' apply plugin: 'com.palantir.baseline-exact-dependen

Re: [PR] Support building with Java 21 [iceberg]

2024-07-02 Thread via GitHub
jbonofre commented on code in PR #10474: URL: https://github.com/apache/iceberg/pull/10474#discussion_r1661040951 ## baseline.gradle: ## @@ -45,7 +45,10 @@ subprojects { apply plugin: 'com.palantir.baseline-reproducibility' apply plugin: 'com.palantir.baseline-exact-depend

Re: [PR] Let `./gradlew clean` clean everything [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10601: URL: https://github.com/apache/iceberg/pull/10601#discussion_r1661104572 ## spark/build.gradle: ## @@ -18,16 +18,19 @@ */ // add enabled Spark version modules to the build -def sparkVersions = (System.getProperty("sparkVersions") != nu

Re: [PR] Enable the Gradle build cache [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10602: URL: https://github.com/apache/iceberg/pull/10602#discussion_r1661089067 ## gradle.properties: ## @@ -24,5 +24,13 @@ systemProp.defaultSparkVersions=3.5 systemProp.knownSparkVersions=3.3,3.4,3.5 systemProp.defaultScalaVersion=2.12 system

Re: [PR] Migrate HadoopCatalog related tests in Flink [iceberg]

2024-07-02 Thread via GitHub
tomtongue commented on PR #10358: URL: https://github.com/apache/iceberg/pull/10358#issuecomment-2200143056 Thank you! Yes, I will submit the backport PR for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Support building with Java 21 [iceberg]

2024-07-02 Thread via GitHub
jbonofre commented on code in PR #10474: URL: https://github.com/apache/iceberg/pull/10474#discussion_r1661062567 ## baseline.gradle: ## @@ -45,7 +45,10 @@ subprojects { apply plugin: 'com.palantir.baseline-reproducibility' apply plugin: 'com.palantir.baseline-exact-depend

Re: [PR] Migrate HadoopCatalog related tests in Flink [iceberg]

2024-07-02 Thread via GitHub
nastra merged PR #10358: URL: https://github.com/apache/iceberg/pull/10358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Deprecate `oauth/tokens` endpoint [iceberg]

2024-07-02 Thread via GitHub
snazy commented on code in PR #10603: URL: https://github.com/apache/iceberg/pull/10603#discussion_r1661024615 ## core/src/test/resources/logback-test.xml: ## @@ -0,0 +1,32 @@ + + + Review Comment: It's required to at least manually verify that the change emits the expected

Re: [PR] Deprecate `oauth/tokens` endpoint [iceberg]

2024-07-02 Thread via GitHub
nastra commented on code in PR #10603: URL: https://github.com/apache/iceberg/pull/10603#discussion_r1661022787 ## core/src/test/resources/logback-test.xml: ## @@ -0,0 +1,32 @@ + + + Review Comment: introducing this file and adding the logback dependency doesn't seem related

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-02 Thread via GitHub
venkata91 commented on code in PR #10548: URL: https://github.com/apache/iceberg/pull/10548#discussion_r1659558108 ## flink/v1.16/flink/src/main/java/org/apache/iceberg/flink/source/enumerator/AbstractIcebergEnumerator.java: ## @@ -93,6 +98,28 @@ public void handleSourceEvent(in

Re: [PR] Fix incorrect double-checked-locking around `TestStreamScanSql#tEnv` [iceberg]

2024-07-02 Thread via GitHub
tlm365 commented on code in PR #10605: URL: https://github.com/apache/iceberg/pull/10605#discussion_r1659528345 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamScanSql.java: ## @@ -55,7 +55,7 @@ public class TestStreamScanSql extends CatalogTestBase

Re: [I] Question about transform result type [iceberg]

2024-07-02 Thread via GitHub
ajantha-bhat commented on issue #10616: URL: https://github.com/apache/iceberg/issues/10616#issuecomment-2200133239 Good catch. I think the spec is correct. Now if we change it the schema will be different from old client and new client (compatibility issue). I think it has to be

Re: [PR] Support for Flink's SpeculativeExecution in batch execution mode [iceberg]

2024-07-02 Thread via GitHub
venkata91 commented on PR #10548: URL: https://github.com/apache/iceberg/pull/10548#issuecomment-2197923268 cc @stevenzwu for review. Should this change also be made in other Flink versions like Flink-1.17 and Flink-1.18? -- This is an automated message from the Apache Git Service. To res

Re: [PR] support python 3.12 [iceberg-python]

2024-07-02 Thread via GitHub
Fokko commented on PR #254: URL: https://github.com/apache/iceberg-python/pull/254#issuecomment-2200098974 @MehulBatra Certainly! How do you feel like splitting the `strtobool` into a separate PR? I think it would be good to already have that in to enable Python 3.12 support with engines ot

Re: [PR] support python 3.12 [iceberg-python]

2024-07-02 Thread via GitHub
MehulBatra commented on PR #254: URL: https://github.com/apache/iceberg-python/pull/254#issuecomment-2200074674 @Fokko could you please run the CI/CD -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] discussion: Refactor our integration tests to make it more scalable. [iceberg-rust]

2024-07-02 Thread via GitHub
thexiay commented on issue #425: URL: https://github.com/apache/iceberg-rust/issues/425#issuecomment-2200047272 maybe [custom_test_frameworks](https://doc.rust-lang.org/nightly/unstable-book/language-features/custom-test-frameworks.html#custom_test_frameworks) helps a lot. -- This is an

Re: [PR] Fix incorrect double-checked-locking around `TestStreamScanSql#tEnv` [iceberg]

2024-07-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10605: URL: https://github.com/apache/iceberg/pull/10605#discussion_r1659508615 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamScanSql.java: ## @@ -55,7 +55,7 @@ public class TestStreamScanSql extends Catalo

[PR] Fix incorrect double-checked-locking around `TestStreamScanSql#tEnv` [iceberg]

2024-07-02 Thread via GitHub
tlm365 opened a new pull request, #10605: URL: https://github.com/apache/iceberg/pull/10605 Resolves #10592. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Spark: support read of partition metadata column when table is over 1k [iceberg]

2024-07-02 Thread via GitHub
szehon-ho commented on code in PR #10547: URL: https://github.com/apache/iceberg/pull/10547#discussion_r1659431526 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java: ## @@ -366,15 +370,55 @@ public void pruneColumns(StructType requestedSchem

Re: [PR] feat: runtime module [iceberg-rust]

2024-07-02 Thread via GitHub
Xuanwo commented on PR #233: URL: https://github.com/apache/iceberg-rust/pull/233#issuecomment-2197860098 > Thanks @odysa , generally it LGTM. But I think we should resolve #418 first to unblock this. Got it. I will work on #418 first. -- This is an automated message from the Apach

Re: [PR] Refactored data clumps with the help of LLMs (research project) [iceberg]

2024-07-02 Thread via GitHub
ajantha-bhat commented on code in PR #10558: URL: https://github.com/apache/iceberg/pull/10558#discussion_r1657085224 ## api/src/main/java/org/apache/iceberg/FileStatistics.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

Re: [PR] Add Files metadata table [iceberg-python]

2024-07-02 Thread via GitHub
Gowthami03B commented on PR #614: URL: https://github.com/apache/iceberg-python/pull/614#issuecomment-2194621297 @Fokko @kevinjqliu @amogh-jahagirdar Can I get a re-review here please? Want to close this asap for the release timeline :) -- This is an automated message from the Apache Git

[PR] refactor(catalog/rest): Split http client logic to seperate mod [iceberg-rust]

2024-07-02 Thread via GitHub
Xuanwo opened a new pull request, #423: URL: https://github.com/apache/iceberg-rust/pull/423 This PR will split http client logic to seperate mod to make it easir to maintain the rest catalog. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Add Files metadata table [iceberg-python]

2024-07-02 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1657080281 ## tests/integration/test_inspect_table.py: ## @@ -445,3 +445,109 @@ def check_pyiceberg_df_equals_spark_df(df: pa.Table, spark_df: DataFrame) -> Non

Re: [PR] OpenAPI: Express server capabilities via /config endpoint [iceberg]

2024-07-02 Thread via GitHub
adutra commented on code in PR #9940: URL: https://github.com/apache/iceberg/pull/9940#discussion_r1657057918 ## open-api/rest-catalog-open-api.yaml: ## @@ -61,6 +61,14 @@ security: - OAuth2: [catalog] - BearerAuth: [] +tags: Review Comment: Indeed there is the `rest

Re: [I] distutils removed in Python >= 3.12 [iceberg-python]

2024-07-02 Thread via GitHub
Fokko commented on issue #859: URL: https://github.com/apache/iceberg-python/issues/859#issuecomment-2191702531 cc @MehulBatra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Api: Track partition statistics via TableMetadata [iceberg]

2024-07-02 Thread via GitHub
ajantha-bhat commented on code in PR #8502: URL: https://github.com/apache/iceberg/pull/8502#discussion_r1654742829 ## core/src/main/java/org/apache/iceberg/TableMetadataParser.java: ## @@ -481,6 +488,13 @@ public static TableMetadata fromJson(String metadataLocation, JsonNode

Re: [I] Add missing error codes to REST spec [iceberg]

2024-07-02 Thread via GitHub
jbonofre commented on issue #10570: URL: https://github.com/apache/iceberg/issues/10570#issuecomment-2191518318 It makes sense to me. Do you mind if I work on a PR about that ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] Doc: Spark quickstart needs to create context directory first [iceberg]

2024-07-02 Thread via GitHub
jiangxin369 opened a new pull request, #10572: URL: https://github.com/apache/iceberg/pull/10572 I'm new to Iceberg. When I try it following the [quickstart](https://iceberg.apache.org/spark-quickstart/#docker-compose), the below error occurrs. ``` # docker-compose up ERROR: build

Re: [I] Iceberg Spark streaming skips rows of data [iceberg]

2024-07-02 Thread via GitHub
cccs-jc commented on issue #10156: URL: https://github.com/apache/iceberg/issues/10156#issuecomment-2189705490 The issue is that when a streamy query resumes (either it was killed, died, gracefully stopped) it does not resume where it left off but rather resumes based on the `stream-from-ti

Re: [I] Iceberg Spark streaming skips rows of data [iceberg]

2024-07-02 Thread via GitHub
singhpk234 commented on issue #10156: URL: https://github.com/apache/iceberg/issues/10156#issuecomment-2189687924 Haven't been looking into this actively. couple of questions : > That's because it applies the stream-from-timestamp when in fact it should not look at it at all

Re: [I] (AWS Lake Formation shared resources) Iceberg tables in AWS Glue catalog has a different root namespace than the original [iceberg-python]

2024-07-02 Thread via GitHub
doctormohamed commented on issue #845: URL: https://github.com/apache/iceberg-python/issues/845#issuecomment-2189695302 To clarify, the "gov-demo_fs" database is Lakeformation managed, the other database is "demo_fs" is a Glue native database. ![image](https://github.com/apache/icebe

Re: [PR] Core: Fix create v1 table on REST Catalog [iceberg]

2024-07-02 Thread via GitHub
amogh-jahagirdar commented on code in PR #10369: URL: https://github.com/apache/iceberg/pull/10369#discussion_r1653278618 ## core/src/main/java/org/apache/iceberg/rest/CatalogHandlers.java: ## @@ -375,7 +376,16 @@ private static TableMetadata create(TableOperations ops, UpdateT

Re: [I] The decimal data type is transformed after the data is inserted. [iceberg-python]

2024-07-02 Thread via GitHub
kevinjqliu commented on issue #751: URL: https://github.com/apache/iceberg-python/issues/751#issuecomment-2189740351 @ndrluis Thanks for the fix. It's passing all the unit tests I've collected for this issue. I'm still trying to wrap my head around why this works and if this is the

Re: [I] (AWS Lake Formation shared resources) Iceberg tables in AWS Glue catalog has a different root namespace than the original [iceberg-python]

2024-07-02 Thread via GitHub
kevinjqliu commented on issue #845: URL: https://github.com/apache/iceberg-python/issues/845#issuecomment-2189668108 Oh! So the returned table name differs from the one specified in `create_table`. And this assertion will fail. ``` database = 'demo_fs' table_name = 'demo_ta

  1   2   >