Re: [PR] Basic Integration with Datafusion [iceberg-rust]

2024-04-29 Thread via GitHub
liurenjie1024 commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1584229759 ## crates/integrations/datafusion/src/catalog.rs: ## @@ -0,0 +1,94 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

Re: [PR] Basic Integration with Datafusion [iceberg-rust]

2024-04-29 Thread via GitHub
Xuanwo commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1584213613 ## crates/integrations/datafusion/src/catalog.rs: ## @@ -0,0 +1,94 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [I] appending to a table with Decimal > 32767 results in `int too big to convert` [iceberg-python]

2024-04-29 Thread via GitHub
bigluck commented on issue #669: URL: https://github.com/apache/iceberg-python/issues/669#issuecomment-2084497090 @Fokko `decimal_to_bytes`, when invoked without `byte_length`, uses `bytes_required` to get the required number of bytes. ```python for v in ['32767', '32768', '32769',

Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-29 Thread via GitHub
liurenjie1024 commented on code in PR #295: URL: https://github.com/apache/iceberg-rust/pull/295#discussion_r1584204040 ## crates/iceberg/src/arrow/reader.rs: ## @@ -186,4 +221,637 @@ impl ArrowReader { Ok(ProjectionMask::leaves(parquet_schema, indices)) }

Re: [I] appending to a table with Decimal > 32767 results in `int too big to convert` [iceberg-python]

2024-04-29 Thread via GitHub
bigluck commented on issue #669: URL: https://github.com/apache/iceberg-python/issues/669#issuecomment-2084469240 For reference, this is the full stack trace: ``` Traceback (most recent call last): File "/Users/bigluck/Desktop/pyiceberg-vlad-bug/test.py", line 31, in n

Re: [PR] Nessie: Make handleExceptionsForCommits public in NessieUtil [iceberg]

2024-04-29 Thread via GitHub
nastra commented on PR #10248: URL: https://github.com/apache/iceberg/pull/10248#issuecomment-2084445178 @YuzongG just curious, where are you planning to re-use this as this should only be used internally in `NessieTableOperations` / `NessieViewOperations` -- This is an automated message

Re: [PR] #9073 Junit 4 tests switched to JUnit 5 [iceberg]

2024-04-29 Thread via GitHub
igoradulian commented on PR #9793: URL: https://github.com/apache/iceberg/pull/9793#issuecomment-2084436773 @nastra, please review recent changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] HiveMetaHook implementation to enable CREATE TABLE and DROP TABLE from Hive queries [iceberg]

2024-04-29 Thread via GitHub
pvary commented on code in PR #1495: URL: https://github.com/apache/iceberg/pull/1495#discussion_r1584145169 ## mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

2024-04-29 Thread via GitHub
nk1506 commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1584137262 ## core/src/main/java/org/apache/iceberg/SnapshotSummary.java: ## @@ -263,6 +273,12 @@ void addTo(ImmutableMap.Builder builder) { setIf(removedDeleteFiles > 0,

Re: [PR] Add ManifestFile Stats in snapshot summary. [iceberg]

2024-04-29 Thread via GitHub
nk1506 commented on code in PR #10246: URL: https://github.com/apache/iceberg/pull/10246#discussion_r1584136686 ## core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java: ## @@ -190,6 +190,7 @@ public List apply(TableMetadata base, Snapshot snapshot) { List apply

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584100390 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584120346 ## crates/iceberg/src/scan.rs: ## @@ -169,55 +177,66 @@ pub struct TableScan { filter: Option>, } -/// A stream of [`FileScanTask`]. -pub type FileSca

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584112041 ## crates/iceberg/src/scan.rs: ## @@ -99,7 +107,7 @@ impl<'a> TableScanBuilder<'a> { } /// Build the table scan. -pub fn build(self) -> crate:

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584111575 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -16,74 +16,49 @@ // under the License. use crate::expr::visitors::bound_predicate_visitor

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584100390 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584100390 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584100390 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584100390 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1584098222 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] HiveMetaHook implementation to enable CREATE TABLE and DROP TABLE from Hive queries [iceberg]

2024-04-29 Thread via GitHub
shivjha30 commented on code in PR #1495: URL: https://github.com/apache/iceberg/pull/1495#discussion_r1584078574 ## mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * o

Re: [I] ValueError: Mismatch in fields: ? [iceberg-python]

2024-04-29 Thread via GitHub
djouallah commented on issue #674: URL: https://github.com/apache/iceberg-python/issues/674#issuecomment-2084298893 sorry, may fault -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Getting storage partitioned join to work [iceberg]

2024-04-29 Thread via GitHub
mrbrahman commented on issue #10250: URL: https://github.com/apache/iceberg/issues/10250#issuecomment-2084281224 The problem was the commented out parameter. Apparently I had to set it thus: ~~~sql set `spark.sql.iceberg.planning.preserve-data-grouping` = true; ~~~ Once th

Re: [I] Documentation page returning 404 [iceberg]

2024-04-29 Thread via GitHub
manuzhang commented on issue #10249: URL: https://github.com/apache/iceberg/issues/10249#issuecomment-2084280354 We are still fixing doc links in previous releases tracked at #10116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Add `table_exists` method to the Catalog [iceberg-python]

2024-04-29 Thread via GitHub
djouallah commented on issue #507: URL: https://github.com/apache/iceberg-python/issues/507#issuecomment-2084257385 @kevinjqliu is this supported ? `AttributeError: 'SqlCatalog' object has no attribute 'table_exists'` -- This is an automated message from the Apache Git Service. To

Re: [PR] Core: Retry connections in JDBC catalog with user configured error code list [iceberg]

2024-04-29 Thread via GitHub
amogh-jahagirdar commented on code in PR #10140: URL: https://github.com/apache/iceberg/pull/10140#discussion_r1584048101 ## core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java: ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Core: Retry connections in JDBC catalog with user configured error code list [iceberg]

2024-04-29 Thread via GitHub
amogh-jahagirdar commented on code in PR #10140: URL: https://github.com/apache/iceberg/pull/10140#discussion_r1584047406 ## core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java: ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Metadata Log Entries metadata table [iceberg-python]

2024-04-29 Thread via GitHub
kevinjqliu commented on code in PR #667: URL: https://github.com/apache/iceberg-python/pull/667#discussion_r1584040866 ## pyiceberg/table/metadata.py: ## @@ -292,6 +292,13 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]: return self.snapshot_by_id(re

Re: [PR] Metadata Log Entries metadata table [iceberg-python]

2024-04-29 Thread via GitHub
kevinjqliu commented on code in PR #667: URL: https://github.com/apache/iceberg-python/pull/667#discussion_r1584037066 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,39 @@ def update_partitions_map( schema=table_schema, ) +def metadata_log_entrie

Re: [PR] Metadata Log Entries metadata table [iceberg-python]

2024-04-29 Thread via GitHub
kevinjqliu commented on code in PR #667: URL: https://github.com/apache/iceberg-python/pull/667#discussion_r1584036888 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,39 @@ def update_partitions_map( schema=table_schema, ) +def metadata_log_entrie

Re: [I] Support partitioned writes [iceberg-python]

2024-04-29 Thread via GitHub
jqin61 commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-2083999610 Updates for monthly sync: 1. Working on dynamic overwrite which gets unblocked by partial deletes https://github.com/apache/iceberg-python/pull/569 2. For transforms functio

Re: [I] The spark's remove_orphan_files procedure cannot expire the orphan files that located in remote object storage services [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] closed issue #2525: The spark's remove_orphan_files procedure cannot expire the orphan files that located in remote object storage services URL: https://github.com/apache/iceberg/issues/2525 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] The spark's remove_orphan_files procedure cannot expire the orphan files that located in remote object storage services [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] commented on issue #2525: URL: https://github.com/apache/iceberg/issues/2525#issuecomment-2083906870 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Predicate pushdown not visible in Spark plan [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] closed issue #2517: Predicate pushdown not visible in Spark plan URL: https://github.com/apache/iceberg/issues/2517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Partitioning on sensitive (encrypted) columns [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] closed issue #2513: Partitioning on sensitive (encrypted) columns URL: https://github.com/apache/iceberg/issues/2513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Iceberg supports Tencent COS [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] closed issue #2498: Iceberg supports Tencent COS URL: https://github.com/apache/iceberg/issues/2498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Predicate pushdown not visible in Spark plan [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] commented on issue #2517: URL: https://github.com/apache/iceberg/issues/2517#issuecomment-2083906854 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Stack map does not match the one at exception handler [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] commented on issue #2507: URL: https://github.com/apache/iceberg/issues/2507#issuecomment-2083906823 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Flink CDC | OOM during initial snapshot [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] closed issue #2504: Flink CDC | OOM during initial snapshot URL: https://github.com/apache/iceberg/issues/2504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Iceberg supports Tencent COS [iceberg]

2024-04-29 Thread via GitHub
github-actions[bot] commented on issue #2498: URL: https://github.com/apache/iceberg/issues/2498#issuecomment-2083906772 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[PR] Build: Bump mkdocs-material from 9.5.19 to 9.5.20 [iceberg-python]

2024-04-29 Thread via GitHub
dependabot[bot] opened a new pull request, #673: URL: https://github.com/apache/iceberg-python/pull/673 Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.5.19 to 9.5.20. Release notes Sourced from https://github.com/squidfunk/mkdocs-material/releases";>mk

[PR] Build: Bump ray from 2.9.2 to 2.12.0 [iceberg-python]

2024-04-29 Thread via GitHub
dependabot[bot] opened a new pull request, #672: URL: https://github.com/apache/iceberg-python/pull/672 Bumps [ray](https://github.com/ray-project/ray) from 2.9.2 to 2.12.0. Release notes Sourced from https://github.com/ray-project/ray/releases";>ray's releases. Ray-2.12.0

[PR] Build: Bump mkdocs-autorefs from 0.5.0 to 1.0.1 [iceberg-python]

2024-04-29 Thread via GitHub
dependabot[bot] opened a new pull request, #671: URL: https://github.com/apache/iceberg-python/pull/671 Bumps [mkdocs-autorefs](https://github.com/mkdocstrings/autorefs) from 0.5.0 to 1.0.1. Release notes Sourced from https://github.com/mkdocstrings/autorefs/releases";>mkdocs-autor

[PR] Build: Bump mkdocs from 1.5.3 to 1.6.0 [iceberg-python]

2024-04-29 Thread via GitHub
dependabot[bot] opened a new pull request, #670: URL: https://github.com/apache/iceberg-python/pull/670 Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.5.3 to 1.6.0. Release notes Sourced from https://github.com/mkdocs/mkdocs/releases";>mkdocs's releases. 1.6.0 Loc

[I] appending to a table with Decimal > 32767 results in `int too big to convert` [iceberg-python]

2024-04-29 Thread via GitHub
vtk9 opened a new issue, #669: URL: https://github.com/apache/iceberg-python/issues/669 ### Apache Iceberg version 0.6.0 (latest release) ### Please describe the bug 🐞 Hello, Is this a bug or is there something obvious I am misunderstanding/misusing. (I am relativ

Re: [PR] Add Files metadata table [iceberg-python]

2024-04-29 Thread via GitHub
geruh commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1583803213 ## tests/integration/test_inspect_table.py: ## @@ -445,3 +445,107 @@ def check_pyiceberg_df_equals_spark_df(df: pa.Table, spark_df: DataFrame) -> Non df =

[I] Getting storage partitioned join to work [iceberg]

2024-04-29 Thread via GitHub
mrbrahman opened a new issue, #10250: URL: https://github.com/apache/iceberg/issues/10250 ### Query engine Spark on AWS EMR 6.15 ### Question Trying to get Storage Partitioned join to work in a simple test case, but not successful. I followed most of the settings mentio

Re: [PR] Nessie: Make handleExceptionsForCommits public in NessieUtil [iceberg]

2024-04-29 Thread via GitHub
dimas-b commented on PR #10248: URL: https://github.com/apache/iceberg/pull/10248#issuecomment-2083757084 @nastra : WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Metadata Log Entries metadata table [iceberg-python]

2024-04-29 Thread via GitHub
corleyma commented on code in PR #667: URL: https://github.com/apache/iceberg-python/pull/667#discussion_r1583798006 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,39 @@ def update_partitions_map( schema=table_schema, ) +def metadata_log_entries(

Re: [PR] Metadata Log Entries metadata table [iceberg-python]

2024-04-29 Thread via GitHub
corleyma commented on code in PR #667: URL: https://github.com/apache/iceberg-python/pull/667#discussion_r1583779367 ## pyiceberg/table/metadata.py: ## @@ -292,6 +292,13 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]: return self.snapshot_by_id(ref.

Re: [PR] Metadata Log Entries metadata table [iceberg-python]

2024-04-29 Thread via GitHub
corleyma commented on code in PR #667: URL: https://github.com/apache/iceberg-python/pull/667#discussion_r1583779367 ## pyiceberg/table/metadata.py: ## @@ -292,6 +292,13 @@ def snapshot_by_name(self, name: str) -> Optional[Snapshot]: return self.snapshot_by_id(ref.

Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-29 Thread via GitHub
aokolnychyi commented on code in PR #10200: URL: https://github.com/apache/iceberg/pull/10200#discussion_r1583671212 ## docs/docs/flink-configuration.md: ## @@ -124,8 +124,9 @@ env.getConfig() | max-planning-snapshot-count | connector.iceberg.max-planning-snapshot-count |

Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-29 Thread via GitHub
aokolnychyi commented on code in PR #10200: URL: https://github.com/apache/iceberg/pull/10200#discussion_r1583669332 ## core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java: ## @@ -109,18 +112,34 @@ protected abstract class BaseEqualityDeltaWriter implements Closeable {

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#issuecomment-2083513030 @Fokko thanks for clearing this up and thanks for the review. This was is the [line](https://github.com/apache/iceberg-python/blob/main/pyiceberg%2Ftable%2F__init__.py#L1647) t

Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-29 Thread via GitHub
viirya commented on code in PR #295: URL: https://github.com/apache/iceberg-rust/pull/295#discussion_r1583629653 ## crates/iceberg/src/arrow/reader.rs: ## @@ -186,4 +221,637 @@ impl ArrowReader { Ok(ProjectionMask::leaves(parquet_schema, indices)) } }

[I] Documentation page returning 404 [iceberg]

2024-04-29 Thread via GitHub
yakovsushenok opened a new issue, #10249: URL: https://github.com/apache/iceberg/issues/10249 ### Apache Iceberg version None ### Query engine None ### Please describe the bug 🐞 [This](https://iceberg.apache.org/docs/1.5.0/spark-configuration.md#sql-extensi

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
Fokko commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1583486316 ## crates/iceberg/src/expr/visitors/manifest_evaluator.rs: ## @@ -16,74 +16,49 @@ // under the License. use crate::expr::visitors::bound_predicate_visitor::{visit,

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1583606448 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1583607459 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning oper

Re: [I] Spark: CDC does not respect when the table is rolled back. [iceberg]

2024-04-29 Thread via GitHub
javrasya commented on issue #10247: URL: https://github.com/apache/iceberg/issues/10247#issuecomment-2083426721 Exactly @manuzhang . It feels like it should filter that out and this is a bug. Wdyt? -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
sdd commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1583577520 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning operations +///

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
sdd commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1583579994 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning operations +///

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
sdd commented on code in PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#discussion_r1583574672 ## crates/iceberg/src/scan.rs: ## @@ -314,6 +312,140 @@ impl TableScan { } } +#[derive(Debug)] +/// Holds the context necessary for file scanning operations +///

Re: [PR] [WIP]: Add `InclusiveMetricsEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
sdd commented on code in PR #347: URL: https://github.com/apache/iceberg-rust/pull/347#discussion_r1583560775 ## crates/iceberg/src/expr/visitors/inclusive_metrics_evaluator.rs: ## @@ -0,0 +1,744 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] [WIP]: Add `InclusiveMetricsEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on code in PR #347: URL: https://github.com/apache/iceberg-rust/pull/347#discussion_r1583517231 ## crates/iceberg/src/expr/visitors/inclusive_metrics_evaluator.rs: ## @@ -0,0 +1,744 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] Iceberg/Comet integration POC [iceberg]

2024-04-29 Thread via GitHub
huaxingao commented on PR #9841: URL: https://github.com/apache/iceberg/pull/9841#issuecomment-2083223651 @aokolnychyi I have addressed the comments. Could you please take one more look when you have a moment? Thanks a lot! -- This is an automated message from the Apache Git Service. To r

Re: [I] Spark: Dropping partition column from old partition table corrupts entire table [iceberg]

2024-04-29 Thread via GitHub
EXPEbdodla commented on issue #10234: URL: https://github.com/apache/iceberg/issues/10234#issuecomment-2083137068 > Which Spark version are you using? I was originally trying with Spark 3.3.0 and iceberg 1.2.1 version. Later I tried with Spark-iceberg docker images `tabulario/

Re: [PR] Migrate FlinkTestBase related tests [iceberg]

2024-04-29 Thread via GitHub
tomtongue commented on PR #10232: URL: https://github.com/apache/iceberg/pull/10232#issuecomment-2083084976 Thanks for the review. Sure, will add the changes for `TestFlinkSourceConfig` and removing the testbase. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-29 Thread via GitHub
stevenzwu commented on code in PR #10200: URL: https://github.com/apache/iceberg/pull/10200#discussion_r1583309420 ## core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java: ## @@ -109,18 +112,34 @@ protected abstract class BaseEqualityDeltaWriter implements Closeable {

Re: [I] `PyArrowFileIO.parse_location` return error `path` for hdfs location. [iceberg-python]

2024-04-29 Thread via GitHub
syun64 commented on issue #449: URL: https://github.com/apache/iceberg-python/issues/449#issuecomment-2083068661 Hi @luocan17 thank you for raising this issue. I have a PR up to attempt a fix. Could I ask for your review on: https://github.com/apache/iceberg-python/pull/668 -- This is an

Re: [PR] Flink: Apply DeleteGranularity for writes [iceberg]

2024-04-29 Thread via GitHub
RussellSpitzer commented on code in PR #10200: URL: https://github.com/apache/iceberg/pull/10200#discussion_r1583299839 ## core/src/main/java/org/apache/iceberg/io/BaseTaskWriter.java: ## @@ -109,18 +112,34 @@ protected abstract class BaseEqualityDeltaWriter implements Closeabl

[PR] Bug Fix `PyArrowFileIO.parse_location` hdfs uri [iceberg-python]

2024-04-29 Thread via GitHub
syun64 opened a new pull request, #668: URL: https://github.com/apache/iceberg-python/pull/668 PyArrow HadoopFileSystem is a thin wrapper around [libhdfs](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/LibHdfs.html). For a given hdfs uri string that looks like `

Re: [PR] Add Files metadata table [iceberg-python]

2024-04-29 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1573767911 ## pyiceberg/table/__init__.py: ## @@ -3537,6 +3537,58 @@ def update_partitions_map( schema=table_schema, ) +def files(self) -> "pa

Re: [PR] Add Files metadata table [iceberg-python]

2024-04-29 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1583228304 ## tests/conftest.py: ## @@ -2060,7 +2060,7 @@ def spark() -> "SparkSession": .config("spark.sql.catalog.hive.warehouse", "s3://warehouse/hive/")

Re: [PR] Add Files metadata table [iceberg-python]

2024-04-29 Thread via GitHub
Gowthami03B commented on code in PR #614: URL: https://github.com/apache/iceberg-python/pull/614#discussion_r1583225201 ## tests/integration/test_inspect_table.py: ## @@ -445,3 +445,65 @@ def check_pyiceberg_df_equals_spark_df(df: pa.Table, spark_df: DataFrame) -> Non

Re: [PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke commented on PR #360: URL: https://github.com/apache/iceberg-rust/pull/360#issuecomment-2082804625 @sdd @Fokko @liurenjie1024 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Refactor: Extract `partition_filters` from `ManifestEvaluator` [iceberg-rust]

2024-04-29 Thread via GitHub
marvinlanhenke opened a new pull request, #360: URL: https://github.com/apache/iceberg-rust/pull/360 ### Which issue does this PR close? Closes #359 ### Rationale for this change The `partition_filter` (inclusive projection) is not only required by the `ManifestEvaluator` but al

[I] Spark CDC does not respect when the table is rolled back. [iceberg]

2024-04-29 Thread via GitHub
javrasya opened a new issue, #10247: URL: https://github.com/apache/iceberg/issues/10247 ### Apache Iceberg version 1.4.3 ### Query engine Spark ### Please describe the bug 🐞 We had to rollback our table because it had some broken snapshots. We are turning

Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-29 Thread via GitHub
liurenjie1024 commented on code in PR #295: URL: https://github.com/apache/iceberg-rust/pull/295#discussion_r1583008556 ## crates/iceberg/src/arrow/reader.rs: ## @@ -186,4 +219,634 @@ impl ArrowReader { Ok(ProjectionMask::leaves(parquet_schema, indices)) }

Re: [PR] MR: iceberg storage handler should set common projection pruning config [iceberg]

2024-04-29 Thread via GitHub
ludlows commented on code in PR #10188: URL: https://github.com/apache/iceberg/pull/10188#discussion_r1583036309 ## mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -111,8 +111,15 @@ public void configureTableJobProperties(TableDesc tableDesc, M

Re: [PR] Basic Integration with Datafusion [iceberg-rust]

2024-04-29 Thread via GitHub
liurenjie1024 commented on code in PR #324: URL: https://github.com/apache/iceberg-rust/pull/324#discussion_r1583019973 ## crates/integrations/datafusion/src/physical_plan/scan.rs: ## @@ -0,0 +1,123 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] Core: add a new task-type field to task JSON serialization. add data task JSON serialization imp. [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #9728: URL: https://github.com/apache/iceberg/pull/9728#discussion_r1583006389 ## core/src/test/java/org/apache/iceberg/TestDataTaskParser.java: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more con

Re: [PR] Core: add a new task-type field to task JSON serialization. add data task JSON serialization imp. [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #9728: URL: https://github.com/apache/iceberg/pull/9728#discussion_r1583003922 ## core/src/test/java/org/apache/iceberg/TestFileScanTaskParser.java: ## @@ -84,20 +127,38 @@ private String expectedFileScanTaskJson() { + "\"residual-filter\"

Re: [PR] Core: add a new task-type field to task JSON serialization. add data task JSON serialization imp. [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #9728: URL: https://github.com/apache/iceberg/pull/9728#discussion_r1583002079 ## core/src/test/java/org/apache/iceberg/TestDataTaskParser.java: ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more con

Re: [PR] Core: add a new task-type field to task JSON serialization. add data task JSON serialization imp. [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #9728: URL: https://github.com/apache/iceberg/pull/9728#discussion_r1582996766 ## core/src/main/java/org/apache/iceberg/SnapshotsTable.java: ## @@ -27,7 +28,8 @@ * This does not include snapshots that have been expired using {@link ExpireSnapsho

Re: [PR] MR: iceberg storage handler should set common projection pruning config [iceberg]

2024-04-29 Thread via GitHub
pvary commented on code in PR #10188: URL: https://github.com/apache/iceberg/pull/10188#discussion_r1582970748 ## mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -111,8 +111,15 @@ public void configureTableJobProperties(TableDesc tableDesc, Map

Re: [PR] feat: Convert predicate to arrow filter and push down to parquet reader [iceberg-rust]

2024-04-29 Thread via GitHub
liurenjie1024 commented on code in PR #295: URL: https://github.com/apache/iceberg-rust/pull/295#discussion_r1582960019 ## crates/iceberg/src/arrow/reader.rs: ## @@ -186,4 +221,637 @@ impl ArrowReader { Ok(ProjectionMask::leaves(parquet_schema, indices)) }

Re: [PR] Core: Add property to disable table initialization for JdbcCatalog [iceberg]

2024-04-29 Thread via GitHub
nastra merged PR #10124: URL: https://github.com/apache/iceberg/pull/10124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] AWS: Creating a Glue table with Lake Formation enabled fails [iceberg]

2024-04-29 Thread via GitHub
Albertagamergod1 commented on issue #10226: URL: https://github.com/apache/iceberg/issues/10226#issuecomment-2082323233 Continue your work please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Core: Prevent duplicate data/delete files [iceberg]

2024-04-29 Thread via GitHub
Fokko commented on code in PR #10007: URL: https://github.com/apache/iceberg/pull/10007#discussion_r1582803332 ## core/src/main/java/org/apache/iceberg/FastAppend.java: ## @@ -83,9 +85,13 @@ protected Map summary() { @Override public FastAppend appendFile(DataFile file)

[PR] Add ManifestFile Stats in snapshot summary. [iceberg]

2024-04-29 Thread via GitHub
nk1506 opened a new pull request, #10246: URL: https://github.com/apache/iceberg/pull/10246 Currently snapshot summary doesn't have statistics related to Manifest Files. This change is adding two new summary fields `"total-data-manifest-files"` and `"total-delete-manifest-files"`. Ther

Re: [PR] Use `pre-commit.ci` to automatically update pre-commit hook versions [iceberg-python]

2024-04-29 Thread via GitHub
Fokko merged PR #665: URL: https://github.com/apache/iceberg-python/pull/665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Core: Retry connections in JDBC catalog with user configured error code list [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #10140: URL: https://github.com/apache/iceberg/pull/10140#discussion_r1582748026 ## core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java: ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

Re: [PR] Core: Retry connections in JDBC catalog with user configured error code list [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #10140: URL: https://github.com/apache/iceberg/pull/10140#discussion_r1582745688 ## core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java: ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

Re: [PR] Core: Retry connections in JDBC catalog with user configured error code list [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #10140: URL: https://github.com/apache/iceberg/pull/10140#discussion_r1582743700 ## core/src/test/java/org/apache/iceberg/TestClientPoolImpl.java: ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more c

Re: [PR] MR: iceberg storage handler should set common projection pruning config [iceberg]

2024-04-29 Thread via GitHub
ludlows commented on code in PR #10188: URL: https://github.com/apache/iceberg/pull/10188#discussion_r1582727175 ## mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java: ## @@ -111,8 +111,15 @@ public void configureTableJobProperties(TableDesc tableDesc, M

Re: [PR] HiveMetaHook implementation to enable CREATE TABLE and DROP TABLE from Hive queries [iceberg]

2024-04-29 Thread via GitHub
pvary commented on code in PR #1495: URL: https://github.com/apache/iceberg/pull/1495#discussion_r1582687790 ## mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Migrate FlinkTestBase related tests [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #10232: URL: https://github.com/apache/iceberg/pull/10232#discussion_r1582655786 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/TestIcebergConnector.java: ## @@ -32,33 +37,37 @@ import org.apache.hadoop.conf.Configuration; import org

Re: [PR] Migrate FlinkTestBase related tests [iceberg]

2024-04-29 Thread via GitHub
nastra commented on code in PR #10232: URL: https://github.com/apache/iceberg/pull/10232#discussion_r1582655185 ## flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/TestChangeLogTable.java: ## @@ -37,23 +43,22 @@ import org.apache.iceberg.relocated.com.google.common.coll

Re: [PR] Build: Bump getdaft from 0.2.16 to 0.2.21 [iceberg-python]

2024-04-29 Thread via GitHub
Fokko merged PR #662: URL: https://github.com/apache/iceberg-python/pull/662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Make AzureProperties w/ shared-key creds serializable [iceberg]

2024-04-29 Thread via GitHub
nastra commented on PR #10045: URL: https://github.com/apache/iceberg/pull/10045#issuecomment-2082016600 @snazy could you add a test please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

  1   2   >