Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-11 Thread via GitHub
manuzhang commented on code in PR #9611: URL: https://github.com/apache/iceberg/pull/9611#discussion_r1561968919 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -909,6 +909,47 @@ public void

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-04-11 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-2050826622 I could re-test it. It would take me a day or two. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9611: URL: https://github.com/apache/iceberg/pull/9611#discussion_r1561911029 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java: ## @@ -909,6 +909,47 @@ public void

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9611: URL: https://github.com/apache/iceberg/pull/9611#discussion_r1561909157 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -359,16 +361,30 @@ private Result

Re: [PR] Spark 3.5: Add max allowed failed commits to RewriteDataFiles when partial progress is enabled [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on PR #9611: URL: https://github.com/apache/iceberg/pull/9611#issuecomment-2050785881 Sorry for the delay. I didn't forget. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-2050785089 Curious if there were any updates as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1561892870 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -284,7 +286,9 @@ public ScanBuilder

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1561892714 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -405,15 +411,23 @@ public boolean equals(Object other) { return

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1561889535 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -117,7 +119,7 @@ public class SparkTable .build(); private

Re: [PR] Spark: Fix SparkTable to use name and effective snapshotID for comparing [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9455: URL: https://github.com/apache/iceberg/pull/9455#discussion_r1561889446 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java: ## @@ -131,12 +133,12 @@ public SparkTable(Table icebergTable, boolean

[PR] Kevinjqliu/poc parallelize tests [iceberg-python]

2024-04-11 Thread via GitHub
kevinjqliu opened a new pull request, #598: URL: https://github.com/apache/iceberg-python/pull/598 I don't think this works until each test can run in isolation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-04-11 Thread via GitHub
szehon-ho commented on PR #10020: URL: https://github.com/apache/iceberg/pull/10020#issuecomment-2050770353 @rdblue @RussellSpitzer may be interested, can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Add support for Iceberg v2 spec in the ArrowReader [iceberg]

2024-04-11 Thread via GitHub
github-actions[bot] commented on issue #2487: URL: https://github.com/apache/iceberg/issues/2487#issuecomment-2050747432 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity

Re: [I] Add support and test cases in ArrowReader for UUIDType, FixedType and DecimalType data types [iceberg]

2024-04-11 Thread via GitHub
github-actions[bot] commented on issue #2486: URL: https://github.com/apache/iceberg/issues/2486#issuecomment-2050747414 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity

Re: [I] Table SortOrder not being respected in Spark write [iceberg]

2024-04-11 Thread via GitHub
github-actions[bot] commented on issue #2490: URL: https://github.com/apache/iceberg/issues/2490#issuecomment-2050747451 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity

Re: [I] Enable reading WASB and WASBS file paths with ABFS and ABFSS [iceberg]

2024-04-11 Thread via GitHub
njriasan commented on issue #10127: URL: https://github.com/apache/iceberg/issues/10127#issuecomment-2050746955 I'm also interesting in making this contribution to Iceberg if others are okay with this change and can give me some pointers about the best way to ensure this can be applied to

Re: [I] Enable reading WASB and WASBS file paths with ABFS and ABFSS [iceberg]

2024-04-11 Thread via GitHub
njriasan commented on issue #10127: URL: https://github.com/apache/iceberg/issues/10127#issuecomment-2050746432 I marked this as Snowflake because this is the offending writer, but I think this generally would apply to all engines. -- This is an automated message from the Apache Git

[I] Enable reading WASB and WASBS file paths with ABFS and ABFSS [iceberg]

2024-04-11 Thread via GitHub
njriasan opened a new issue, #10127: URL: https://github.com/apache/iceberg/issues/10127 ### Feature Request / Improvement When you setup a managed Snowflake Iceberg table on an Azure account, they will provide locations that use `wasbs://` and not `abfss://`. `wasb` is currently

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-11 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1561826768 ## pyiceberg/io/pyarrow.py: ## @@ -1122,12 +1121,12 @@ def project_table( return result -def to_requested_schema(requested_schema: Schema,

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-11 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1561826510 ## pyiceberg/io/pyarrow.py: ## @@ -1772,16 +1772,17 @@ def write_file(io: FileIO, table_metadata: TableMetadata, tasks: Iterator[WriteT ) def

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-11 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1561824087 ## pyiceberg/io/pyarrow.py: ## @@ -1772,16 +1772,17 @@ def write_file(io: FileIO, table_metadata: TableMetadata, tasks: Iterator[WriteT ) def

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-11 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1561823277 ## pyiceberg/io/pyarrow.py: ## @@ -1122,12 +1121,12 @@ def project_table( return result -def to_requested_schema(requested_schema: Schema,

Re: [PR] Core: Calling rewrite_position_delete_files fails on tables with more than 1k columns [iceberg]

2024-04-11 Thread via GitHub
szehon-ho commented on PR #10020: URL: https://github.com/apache/iceberg/pull/10020#issuecomment-2050636746 Because this problem may affect more than just rewrite_position_deletes, I rewrote the patch to make the logic more generic at the Iceberg API level, rather than just

Re: [PR] Core: Add data sequence number as derived column to files metadata table [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #9813: URL: https://github.com/apache/iceberg/pull/9813#discussion_r1561801551 ## core/src/main/java/org/apache/iceberg/BaseFilesTable.java: ## @@ -46,15 +45,37 @@ abstract class BaseFilesTable extends BaseMetadataTable { @Override

Re: [PR] Core: Add data sequence number as derived column to files metadata table [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on PR #9813: URL: https://github.com/apache/iceberg/pull/9813#issuecomment-2050581156 Let me take a quick look today as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Spark 3.4: Fix system function pushdown in CoW row-level commands [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on PR #10119: URL: https://github.com/apache/iceberg/pull/10119#issuecomment-2050543397 Thanks for reviewing, @szehon-ho @tmnd1991 @nastra! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Spark 3.4: Fix system function pushdown in CoW row-level commands [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi merged PR #10119: URL: https://github.com/apache/iceberg/pull/10119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Spark 3.4: Fix system function pushdown in CoW row-level commands [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on PR #10119: URL: https://github.com/apache/iceberg/pull/10119#issuecomment-2050541242 In 3.4, we use custom `ReplaceIcebergData` and rewrite the operations differently. I think that logic is only needed in 3.5. -- This is an automated message from the Apache Git

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #10074: URL: https://github.com/apache/iceberg/pull/10074#discussion_r1561704219 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -21,19 +21,31 @@ import org.apache.iceberg.spark.procedures.SparkProcedures;

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on code in PR #10074: URL: https://github.com/apache/iceberg/pull/10074#discussion_r1561701010 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -21,19 +21,31 @@ import org.apache.iceberg.spark.procedures.SparkProcedures;

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
amogh-jahagirdar merged PR #10052: URL: https://github.com/apache/iceberg/pull/10052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Add metadata tables [iceberg-python]

2024-04-11 Thread via GitHub
Fokko commented on issue #511: URL: https://github.com/apache/iceberg-python/issues/511#issuecomment-2050385646 @geruh Great seeing you here again. I've assigned it to you  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Add metadata tables [iceberg-python]

2024-04-11 Thread via GitHub
geruh commented on issue #511: URL: https://github.com/apache/iceberg-python/issues/511#issuecomment-2050356108 Hey Fokko, I'll take references here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-11 Thread via GitHub
tshauck commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1561484967 ## crates/integrations/src/datafusion/schema.rs: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-11 Thread via GitHub
kevinjqliu commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1561420182 ## pyiceberg/io/pyarrow.py: ## @@ -1772,12 +1772,13 @@ def write_file(io: FileIO, table_metadata: TableMetadata, tasks: Iterator[WriteT ) def

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-11 Thread via GitHub
marvinlanhenke commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1561387375 ## Cargo.toml: ## @@ -21,6 +21,7 @@ members = [ "crates/catalog/*", "crates/examples", "crates/iceberg", +"crates/integrations", Review

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-11 Thread via GitHub
marvinlanhenke commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1561386024 ## crates/integrations/src/datafusion/schema.rs: ## @@ -0,0 +1,97 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
rodmeneses commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1561382610 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/TestChangeLogTable.java: ## @@ -98,7 +98,7 @@ public void before() { @Override public void

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
rodmeneses commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1561381139 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java: ## @@ -248,6 +248,7 @@ public void

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
rodmeneses commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1561380930 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -443,9 +446,10 @@ public void

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
rodmeneses commented on PR #10112: URL: https://github.com/apache/iceberg/pull/10112#issuecomment-2050175110 > @rodmeneses: Just curious to know what command you have used for the step "Flink: Recover flink/1.18 files from history" ? It is really nice. > > I thought the only way was

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
amogh-jahagirdar commented on PR #10052: URL: https://github.com/apache/iceberg/pull/10052#issuecomment-2050168441 Once it's rebased, I'll run the checks and after they pass, I'll merge! Thanks for contributing @harishch1998 , and thanks @nastra for the review -- This is an automated

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on PR #10052: URL: https://github.com/apache/iceberg/pull/10052#issuecomment-2050154305 @harishch1998 can you please rebase the PR and fix the merge conflict(s)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10124: URL: https://github.com/apache/iceberg/pull/10124#discussion_r1561346286 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -137,6 +137,10 @@ public void initialize(String name, Map properties) { this.connections

Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-11 Thread via GitHub
rahil-c commented on code in PR #9782: URL: https://github.com/apache/iceberg/pull/9782#discussion_r1561347274 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -278,14 +286,26 @@ public void setConf(Object newConf) { @Override public List

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10124: URL: https://github.com/apache/iceberg/pull/10124#discussion_r1561346286 ## core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java: ## @@ -137,6 +137,10 @@ public void initialize(String name, Map properties) { this.connections

Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #9782: URL: https://github.com/apache/iceberg/pull/9782#discussion_r1561341671 ## core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java: ## @@ -278,14 +286,26 @@ public void setConf(Object newConf) { @Override public List

Re: [PR] Add Pagination To List Apis [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #9782: URL: https://github.com/apache/iceberg/pull/9782#discussion_r1561325160 ## core/src/test/java/org/apache/iceberg/rest/responses/TestListNamespacesResponse.java: ## @@ -34,7 +34,7 @@ public class TestListNamespacesResponse extends

Re: [I] gc.enabled property is set to false by default for Apache Iceberg table created in Nessie Catalog [iceberg]

2024-04-11 Thread via GitHub
ajantha-bhat commented on issue #9562: URL: https://github.com/apache/iceberg/issues/9562#issuecomment-2050093246 Yes, Nessie GC tool will clean up the expired or unreferenced data files as well along with Iceberg metadata files. -- This is an automated message from the Apache Git

Re: [I] spark.table() raises warn: Unclosed S3FileIO instance in GlueTableOperations [iceberg]

2024-04-11 Thread via GitHub
nastra closed issue #7841: spark.table() raises warn: Unclosed S3FileIO instance in GlueTableOperations URL: https://github.com/apache/iceberg/issues/7841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] spark.table() raises warn: Unclosed S3FileIO instance in GlueTableOperations [iceberg]

2024-04-11 Thread via GitHub
nastra commented on issue #7841: URL: https://github.com/apache/iceberg/issues/7841#issuecomment-2050076019 Closing this at it has been fixed by https://github.com/apache/iceberg/pull/8315. @tiki-sk please open a new issue for the catalogs where the warning pops up -- This is an

Re: [I] spark.table() raises warn: Unclosed S3FileIO instance in GlueTableOperations [iceberg]

2024-04-11 Thread via GitHub
nastra commented on issue #7841: URL: https://github.com/apache/iceberg/issues/7841#issuecomment-2050074420 > What are the implications of this warning? @W-Ely Sorry I missed this comment. The `S3FileIO` instance will still be closed, but it's indicating that it hasn't been properly

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1561265880 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String operation() { +

Re: [I] OpenApi requestBody: some are optional but should be required [iceberg]

2024-04-11 Thread via GitHub
westse commented on issue #10004: URL: https://github.com/apache/iceberg/issues/10004#issuecomment-2050018710 @nastra Is the openapi generated? If so, can you point me to the relevant code? If not, the change is trivial and I can certainly create a PR. -- This is an automated message

Re: [PR] Manifest list encryption [iceberg]

2024-04-11 Thread via GitHub
rdblue commented on code in PR #7770: URL: https://github.com/apache/iceberg/pull/7770#discussion_r1561258031 ## api/src/main/java/org/apache/iceberg/Snapshot.java: ## @@ -162,6 +162,25 @@ default Iterable removedDeleteFiles(FileIO io) { */ String

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1561229088 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String

Re: [PR] Core, Spark: Use 'delete' if RowDelta only has delete files [iceberg]

2024-04-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #10123: URL: https://github.com/apache/iceberg/pull/10123#discussion_r1561229088 ## core/src/main/java/org/apache/iceberg/BaseRowDelta.java: ## @@ -43,6 +43,10 @@ protected BaseRowDelta self() { @Override protected String

Re: [PR] [WIP] Integration with Datafusion [iceberg-rust]

2024-04-11 Thread via GitHub
tshauck commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1561209884 ## Cargo.toml: ## @@ -21,6 +21,7 @@ members = [ "crates/catalog/*", "crates/examples", "crates/iceberg", +"crates/integrations", Review Comment:

Re: [PR] Spark-3.5: Support CTAS and RTAS to preserve schema nullability. [iceberg]

2024-04-11 Thread via GitHub
amogh-jahagirdar commented on code in PR #10074: URL: https://github.com/apache/iceberg/pull/10074#discussion_r1561209606 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/BaseCatalog.java: ## @@ -21,19 +21,31 @@ import

Re: [PR] Core: Allow manifest file cache to be configurable [iceberg]

2024-04-11 Thread via GitHub
tdcmeehan commented on PR #10118: URL: https://github.com/apache/iceberg/pull/10118#issuecomment-2049941295 Thanks for your question @singhpk234. The motivation is to try to make both improvements listed in #9991, and I would like to solicit feedback on this approach. tl;dr, there

Re: [PR] Core: Fix retry behavior for Jdbc Client [iceberg]

2024-04-11 Thread via GitHub
amogh-jahagirdar commented on PR #7561: URL: https://github.com/apache/iceberg/pull/7561#issuecomment-2049892193 Hey all @jean-humann @cccs-br @cccs-eric @cccs-jc sorry I forgot to follow up on this PR! I'm taking a look now to see how to carry this forward. It does seem like since there

Re: [PR] WIP: View Spec implementation [iceberg-rust]

2024-04-11 Thread via GitHub
ZENOTME commented on PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#issuecomment-2049849148 Thanks! Nice work! @c-thiel -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] WIP: View Spec implementation [iceberg-rust]

2024-04-11 Thread via GitHub
ZENOTME commented on code in PR #331: URL: https://github.com/apache/iceberg-rust/pull/331#discussion_r1561132461 ## crates/iceberg/src/catalog/mod.rs: ## @@ -438,6 +438,31 @@ impl TableUpdate { } } +/// TableCreation represents the creation of a table in the catalog.

Re: [I] gc.enabled property is set to false by default for Apache Iceberg table created in Nessie Catalog [iceberg]

2024-04-11 Thread via GitHub
clintf1982 commented on issue #9562: URL: https://github.com/apache/iceberg/issues/9562#issuecomment-2049773697 @nastra Does nessie gc tool clean also the data files of the tables ? I couldn't really understand if it cleans the data files or not. If it doesn't, how can you know which

Re: [I] spark.table() raises warn: Unclosed S3FileIO instance in GlueTableOperations [iceberg]

2024-04-11 Thread via GitHub
tiki-sk commented on issue #7841: URL: https://github.com/apache/iceberg/issues/7841#issuecomment-2049769874 I got the same warning for HadoopCatalog: `24/04/11 15:54:15 WARN o.a.i.a.s.S3FileIO: Unclosed S3FileIO instance created by:

Re: [I] Unable to load an iceberg table from aws glue catalog [iceberg-python]

2024-04-11 Thread via GitHub
arookieds commented on issue #515: URL: https://github.com/apache/iceberg-python/issues/515#issuecomment-2049745023 I have tried both solution, ie: - setting the env variable to the proper AWS region - providing it within the function call But I am always getting the same error:

Re: [I] NPE During RewriteDataFiles Action with Nessie [iceberg]

2024-04-11 Thread via GitHub
ajantha-bhat commented on issue #10110: URL: https://github.com/apache/iceberg/issues/10110#issuecomment-2049681447 Never mind, it is a Function in static block not one time assignment. So, on each call, it will create a new `JavaHttpClient`. So, no issue on the older versions too.

[I] Delete using Merge-on-Read sets `OVERWRITE` while `DELETE` is expected [iceberg]

2024-04-11 Thread via GitHub
Fokko opened a new issue, #10122: URL: https://github.com/apache/iceberg/issues/10122 ### Apache Iceberg version None ### Query engine None ### Please describe the bug  When deleting a row using a Merge on Read it should set the operation to DELETE

Re: [I] Snapshot sets `OVERWRITE` while `DELETE` is expected [iceberg]

2024-04-11 Thread via GitHub
Fokko closed issue #9995: Snapshot sets `OVERWRITE` while `DELETE` is expected URL: https://github.com/apache/iceberg/issues/9995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Snapshot sets `OVERWRITE` while `DELETE` is expected [iceberg]

2024-04-11 Thread via GitHub
Fokko commented on issue #9995: URL: https://github.com/apache/iceberg/issues/9995#issuecomment-2049593603 @nastra noticed that in the last step it is `people2` instead of `people`. So it is actually correct -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Structured streaming writes to partitioned table fails when spark.sql.extensions is set to IcebergSparkSessionExtensions [iceberg]

2024-04-11 Thread via GitHub
greg-roberts-bbc commented on issue #7226: URL: https://github.com/apache/iceberg/issues/7226#issuecomment-2049509117 @stevenlii > But this way there is no checkpointLocation. How do you manage the offset? As I said, we're using `GlueContext.forEachBatch` which allows you to

Re: [I] Iceberg View Support [iceberg-rust]

2024-04-11 Thread via GitHub
c-thiel commented on issue #55: URL: https://github.com/apache/iceberg-rust/issues/55#issuecomment-2049411950 Independant on how the the actual loader functions work, I believe implementing the Iceberg Spec for `Views` is a good idea. As I have needed the structs from the spec, I

Re: [PR] Introduce two properties for reading the connection timeout and socke… [iceberg]

2024-04-11 Thread via GitHub
nastra merged PR #10053: URL: https://github.com/apache/iceberg/pull/10053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
ajantha-bhat commented on PR #10112: URL: https://github.com/apache/iceberg/pull/10112#issuecomment-2049367693 @rodmeneses: Just curious to know what command you have used for the step "Flink: Recover flink/1.18 files from history" ? It is really nice. I thought the only way was

Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-11 Thread via GitHub
nk1506 commented on code in PR #9852: URL: https://github.com/apache/iceberg/pull/9852#discussion_r1560434336 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveViewOperations.java: ## @@ -0,0 +1,386 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[PR] Read: fetch file_schema directly from pyarrow_to_schema [iceberg-python]

2024-04-11 Thread via GitHub
HonahX opened a new pull request, #597: URL: https://github.com/apache/iceberg-python/pull/597 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Sanitized special character column name before writing to parquet [iceberg-python]

2024-04-11 Thread via GitHub
HonahX commented on code in PR #590: URL: https://github.com/apache/iceberg-python/pull/590#discussion_r1560606567 ## tests/integration/test_writes/test_writes.py: ## @@ -270,6 +270,25 @@ def get_current_snapshot_id(identifier: str) -> int: assert

Re: [I] UncheckedSQLException: Failed to execute exists query: SELECT table_namespace FROM iceberg_tables WHERE catalog_name = ? AND (table_namespace = ? OR table_namespace LIKE ? ESCAPE '\') LIMIT 1

2024-04-11 Thread via GitHub
jbonofre commented on issue #10056: URL: https://github.com/apache/iceberg/issues/10056#issuecomment-2049270497 @nastra yes, I agree. Let me fix that and verify with MySQL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] UncheckedSQLException: Failed to execute exists query: SELECT table_namespace FROM iceberg_tables WHERE catalog_name = ? AND (table_namespace = ? OR table_namespace LIKE ? ESCAPE '\') LIMIT 1

2024-04-11 Thread via GitHub
nastra commented on issue #10056: URL: https://github.com/apache/iceberg/issues/10056#issuecomment-2049263968 I confirmed that the breaking change has been introduced by https://github.com/apache/iceberg/pull/8340 (the `ESCAPE`). Once I removed the `ESCAPE` from the two SQL statements, it

Re: [I] Spark can not delete table metadata and data when drop table [iceberg]

2024-04-11 Thread via GitHub
wForget commented on issue #9990: URL: https://github.com/apache/iceberg/issues/9990#issuecomment-2049244909 I also encountered the same problem when using SparkSessionCatalog to delete a non-iceberg hive table. @manuzhang Iceberg HiveCatalog allows deletion of non-iceberg tables, is

Re: [I] UncheckedSQLException: Failed to execute exists query: SELECT table_namespace FROM iceberg_tables WHERE catalog_name = ? AND (table_namespace = ? OR table_namespace LIKE ? ESCAPE '\') LIMIT 1

2024-04-11 Thread via GitHub
jbonofre commented on issue #10056: URL: https://github.com/apache/iceberg/issues/10056#issuecomment-2049214933 Before my changes, I remember to have issue with MySQL (whereas it was working fine with PostgreSQL). I will double check. -- This is an automated message from the Apache Git

Re: [I] UncheckedSQLException: Failed to execute exists query: SELECT table_namespace FROM iceberg_tables WHERE catalog_name = ? AND (table_namespace = ? OR table_namespace LIKE ? ESCAPE '\') LIMIT 1

2024-04-11 Thread via GitHub
nastra commented on issue #10056: URL: https://github.com/apache/iceberg/issues/10056#issuecomment-2049211953 @jbonofre I believe JDBC + MySql used to work as there have been users using it in the past that reported other issues, which we then fixed as part of

Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

2024-04-11 Thread via GitHub
HonahX commented on issue #584: URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2049204249 Sorry for being late here. Thanks everyone for the valuable discussion, PR and tests. I would like to add some insight on the read side. Regarding @Fokko 's comment: > If

Re: [PR] Spark 3.4: Fix system function pushdown in CoW row-level commands [iceberg]

2024-04-11 Thread via GitHub
tmnd1991 commented on PR #10119: URL: https://github.com/apache/iceberg/pull/10119#issuecomment-2049109276 > lgtm, ReplaceData is not available in Spark 3.4? ReplaceData is there in 3.4 but has 1 less argument (`groupFilterCondition: Option[Expression]`) but I see it's not used at

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560568156 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,108 @@ public void testHeadFailure() throws JsonProcessingException {

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560558272 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,108 @@ public void testHeadFailure() throws JsonProcessingException {

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560549677 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,108 @@ public void testHeadFailure() throws JsonProcessingException {

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560545014 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,108 @@ public void testHeadFailure() throws JsonProcessingException {

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560530917 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,108 @@ public void testHeadFailure() throws JsonProcessingException {

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560539559 ## core/src/main/java/org/apache/iceberg/rest/HTTPClient.java: ## @@ -85,6 +87,8 @@ public class HTTPClient implements RESTClient { private HTTPClient(

Re: [I] Nessie: Drop table from Nessie catalog is not cleaning the table files. [iceberg]

2024-04-11 Thread via GitHub
ajantha-bhat closed issue #3528: Nessie: Drop table from Nessie catalog is not cleaning the table files. URL: https://github.com/apache/iceberg/issues/3528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Nessie: Drop table from Nessie catalog is not cleaning the table files. [iceberg]

2024-04-11 Thread via GitHub
ajantha-bhat commented on issue #3528: URL: https://github.com/apache/iceberg/issues/3528#issuecomment-2049070653 we log a warning now for purge flag and we can use https://projectnessie.org/features/gc/ -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Extend HTTPClient Builder to allow setting a proxy server [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10052: URL: https://github.com/apache/iceberg/pull/10052#discussion_r1560530917 ## core/src/test/java/org/apache/iceberg/rest/TestHTTPClient.java: ## @@ -121,6 +128,108 @@ public void testHeadFailure() throws JsonProcessingException {

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1560519389 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/TestFlinkCatalogTable.java: ## @@ -294,39 +308,284 @@ public void testAlterTable() throws

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1560518381 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestFlinkInputFormatReaderDeletes.java: ## @@ -35,6 +35,7 @@ import

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1560516897 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/TestStreamingMonitorFunction.java: ## @@ -248,6 +248,7 @@ public void

Re: [PR] Flink: Adds support for Flink 1.19 version [iceberg]

2024-04-11 Thread via GitHub
nastra commented on code in PR #10112: URL: https://github.com/apache/iceberg/pull/10112#discussion_r1560516528 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/source/enumerator/TestContinuousSplitPlannerImpl.java: ## @@ -443,9 +446,10 @@ public void

Re: [PR] feat: init iceberg writer [iceberg-rust]

2024-04-11 Thread via GitHub
ZENOTME commented on code in PR #275: URL: https://github.com/apache/iceberg-rust/pull/275#discussion_r1560477918 ## crates/iceberg/src/writer/mod.rs: ## @@ -15,13 +15,69 @@ // specific language governing permissions and limitations // under the License. -//! The iceberg

Re: [PR] Hive: Add View support for HIVE catalog [iceberg]

2024-04-11 Thread via GitHub
nk1506 commented on code in PR #9852: URL: https://github.com/apache/iceberg/pull/9852#discussion_r1560477166 ## hive-metastore/src/main/java/org/apache/iceberg/hive/HiveViewOperations.java: ## @@ -0,0 +1,395 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one