Re: [PR] chore(deps): Bump peaceiris/actions-gh-pages from 3.9.2 to 3.9.3 [iceberg-rust]

2024-01-03 Thread via GitHub
Fokko commented on PR #143: URL: https://github.com/apache/iceberg-rust/pull/143#issuecomment-1876667145 Hmm, this didn't cause the branch to update 樂 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] bug: Iceberg decimal should be converted to fixed in avro. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on issue #144: URL: https://github.com/apache/iceberg-rust/issues/144#issuecomment-1876626826 > Great catch @liurenjie1024. It is stored as a fixed with the minimal number of bytes required to store the full precision of the type. An example how it is done in

Re: [I] doc: rust.iceberg.apache.org is not resolved [iceberg-rust]

2024-01-03 Thread via GitHub
Fokko commented on issue #137: URL: https://github.com/apache/iceberg-rust/issues/137#issuecomment-1876625695 Thanks for raising an issue: https://issues.apache.org/jira/browse/INFRA-25338 It looks good as far as I can see. -- This is an automated message from the Apache Git

Re: [PR] Docs: Note CREATE TABLE LIKE is not supported in Spark DDL [iceberg]

2024-01-03 Thread via GitHub
nastra merged PR #9358: URL: https://github.com/apache/iceberg/pull/9358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] bug: Iceberg decimal should be converted to fixed in avro. [iceberg-rust]

2024-01-03 Thread via GitHub
Fokko commented on issue #144: URL: https://github.com/apache/iceberg-rust/issues/144#issuecomment-1876623119 Great catch @liurenjie1024. It is stored as a fixed with the minimal number of bytes required to store the full precision of the type. An example how it is done in Python:

Re: [I] dropDeleteFilesOlderthan should be partition level instead of table level [iceberg]

2024-01-03 Thread via GitHub
manuzhang commented on issue #9383: URL: https://github.com/apache/iceberg/issues/9383#issuecomment-1876612899 @zinking I see. An extreme case is if there's one partition left not compacted, none of the other partitions can drop their delete files after compaction. -- This is an

Re: [I] EOF: read 1 bytes when load manifest write by icelake [iceberg-python]

2024-01-03 Thread via GitHub
Fokko closed issue #241: EOF: read 1 bytes when load manifest write by icelake URL: https://github.com/apache/iceberg-python/issues/241 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] EOF: read 1 bytes when load manifest write by icelake [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on issue #241: URL: https://github.com/apache/iceberg-python/issues/241#issuecomment-1876600551 Ah that makes sense, with the bytes it will first read the length of the bytes to be read. I didn't get to the debugging yet since it is quite tedious :D Great find! -- This

Re: [PR] Spark 3.5: Support filtering with buckets in RewriteDataFilesProcedure [iceberg]

2024-01-03 Thread via GitHub
manuzhang commented on PR #9396: URL: https://github.com/apache/iceberg/pull/9396#issuecomment-1876545907 @wangtaohz thanks for the example. Will a "native" partition filter be more efficient? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Spark 3.5: Support filtering with buckets in RewriteDataFilesProcedure [iceberg]

2024-01-03 Thread via GitHub
wangtaohz commented on PR #9396: URL: https://github.com/apache/iceberg/pull/9396#issuecomment-1876455887 I can provide you with an example that has been tested, but I'm not sure if it's the best practice.

Re: [PR] Bug fix falsy value of zero [iceberg-python]

2024-01-03 Thread via GitHub
MehulBatra commented on code in PR #249: URL: https://github.com/apache/iceberg-python/pull/249#discussion_r1441371562 ## pyiceberg/table/__init__.py: ## @@ -545,7 +545,7 @@ def new_snapshot_id(self) -> int: def current_snapshot(self) -> Optional[Snapshot]:

Re: [PR] Core: remove statistic files in CatalogUtil:dropTableData [iceberg]

2024-01-03 Thread via GitHub
ajantha-bhat commented on PR #9305: URL: https://github.com/apache/iceberg/pull/9305#issuecomment-1876393081 > There is also a partition stats file added recently https://github.com/apache/iceberg/commit/6e21bbf4c4cd2c8351a64636f91d05d00492dff2 We should handle this for them aswell.

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441301360 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3AccessGrantsPluginConfigurations.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] dropDeleteFilesOlderthan should be partition level instead of table level [iceberg]

2024-01-03 Thread via GitHub
zinking commented on issue #9383: URL: https://github.com/apache/iceberg/issues/9383#issuecomment-1876294561 @manuzhang sounds different stuff. the issue pointed here is not POS delete specific. equality delete has same issue. the key here is `partition` delete files within a partition

Re: [I] EOF: read 1 bytes when load manifest write by icelake [iceberg-python]

2024-01-03 Thread via GitHub
liurenjie1024 commented on issue #241: URL: https://github.com/apache/iceberg-python/issues/241#issuecomment-1876278634 After some investigation, I realized that this is caused by a bug in icelake, which converts to bytes instead of fixed when converting to avro schema. I'll close this for

Re: [PR] Correct schema behavior [iceberg-python]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on code in PR #247: URL: https://github.com/apache/iceberg-python/pull/247#discussion_r1441260336 ## pyiceberg/table/__init__.py: ## @@ -942,15 +942,16 @@ def snapshot(self) -> Optional[Snapshot]: return self.table.current_snapshot()

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441254746 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -628,6 +630,30 @@ impl TryFrom for ManifestContentType { } } +impl ManifestListEntry { Review

[I] refactor: Rename [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 opened a new issue, #145: URL: https://github.com/apache/iceberg-rust/issues/145 I'm confused by the naming, should this be a `ManifestFile`? From the [spec](https://iceberg.apache.org/spec/#manifest-lists): Manifest list files store

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441253820 ## crates/iceberg/src/spec/manifest_list.rs: ## @@ -628,6 +630,30 @@ impl TryFrom for ManifestContentType { } } +impl ManifestListEntry { Review

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441253067 ## crates/iceberg/src/spec/manifest.rs: ## @@ -819,6 +849,49 @@ impl ManifestEntry { ManifestStatus::Added | ManifestStatus::Existing )

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441252030 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441251310 ## crates/iceberg/src/spec/table_metadata.rs: ## @@ -38,6 +38,12 @@ static MAIN_BRANCH: = "main"; static DEFAULT_SPEC_ID: i32 = 0; static

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441250114 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441249808 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441249437 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441249066 ## crates/iceberg/src/scan.rs: ## @@ -0,0 +1,616 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441248686 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -124,6 +150,70 @@ impl Snapshot { Utc.timestamp_millis_opt(self.timestamp_ms).unwrap() } +

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441243904 ## crates/iceberg/Cargo.toml: ## @@ -62,4 +62,5 @@ uuid = { workspace = true } [dev-dependencies] pretty_assertions = { workspace = true } tempfile = {

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1441243904 ## crates/iceberg/Cargo.toml: ## @@ -62,4 +62,5 @@ uuid = { workspace = true } [dev-dependencies] pretty_assertions = { workspace = true } tempfile = {

Re: [PR] JMH: Improvements to `jmh.gradle` [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar merged PR #9390: URL: https://github.com/apache/iceberg/pull/9390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Build: Bump spring-boot from 2.5.4 to 3.2.1 [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on PR #9371: URL: https://github.com/apache/iceberg/pull/9371#issuecomment-1876239430 I'm actually quite confused why we need spring boot dependencies in the project? If we could remove that, that would be ideal. -- This is an automated message from the Apache

[I] manifest list missing error after "cannot commit table due to base location not same as glue location" [iceberg]

2024-01-03 Thread via GitHub
waichee opened a new issue, #9406: URL: https://github.com/apache/iceberg/issues/9406 ### Apache Iceberg version 1.3.1 ### Query engine Spark ### Please describe the bug  **Setup** We use the following spark libraries to write to Iceberg on EMR:

Re: [PR] Core: remove statistic files in CatalogUtil:dropTableData [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar merged PR #9305: URL: https://github.com/apache/iceberg/pull/9305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] Core: Remove deprecated method from BaseMetadataTable [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on PR #9298: URL: https://github.com/apache/iceberg/pull/9298#issuecomment-1876211849 Sorry for the delay in review on this @ajantha-bhat , I'll take a look at this tomorrow. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] How does Iceberg support writing data to local paths, network disks, interfaces, and other storage media [iceberg]

2024-01-03 Thread via GitHub
manuzhang commented on issue #9378: URL: https://github.com/apache/iceberg/issues/9378#issuecomment-1876210607 It depends on your catalog `io-impl`. Take https://iceberg.apache.org/docs/latest/aws/#spark as an example. -- This is an automated message from the Apache Git Service. To

Re: [I] dropDeleteFilesOlderthan should be partition level instead of table level [iceberg]

2024-01-03 Thread via GitHub
manuzhang commented on issue #9383: URL: https://github.com/apache/iceberg/issues/9383#issuecomment-1876206670 > I am seeing v2 tables (partitioned tables) having delete files retained in partitions but those delete files wont apply to any data files within that partition. This is

Re: [I] doc: rust.iceberg.apache.org is not resolved [iceberg-rust]

2024-01-03 Thread via GitHub
liurenjie1024 commented on issue #137: URL: https://github.com/apache/iceberg-rust/issues/137#issuecomment-1876198271 Seems still not working. Do we have any way to debug this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Flink: Watermark read options [iceberg]

2024-01-03 Thread via GitHub
stevenzwu commented on code in PR #9346: URL: https://github.com/apache/iceberg/pull/9346#discussion_r1441115126 ## flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/source/IcebergTableSource.java: ## @@ -131,16 +131,17 @@ private DataStream

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441180949 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441178910 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441178910 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441177965 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441176818 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [I] Partitioned table folder creation behaviour [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar closed issue #9388: Partitioned table folder creation behaviour URL: https://github.com/apache/iceberg/issues/9388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441141271 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +796,47 @@ public void applyEndpointConfigurations(T builder) {

Re: [I] How does iceberg ensure the correctness of data writing under high concurrency [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] commented on issue #6885: URL: https://github.com/apache/iceberg/issues/6885#issuecomment-1876137969 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] How does iceberg ensure the correctness of data writing under high concurrency [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] closed issue #6885: How does iceberg ensure the correctness of data writing under high concurrency URL: https://github.com/apache/iceberg/issues/6885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Documentation improvements in regards to time travel [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] commented on issue #7000: URL: https://github.com/apache/iceberg/issues/7000#issuecomment-1876137932 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Documentation improvements in regards to time travel [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] closed issue #7000: Documentation improvements in regards to time travel URL: https://github.com/apache/iceberg/issues/7000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Hive ping functionality seems to leak threads [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] commented on issue #7034: URL: https://github.com/apache/iceberg/issues/7034#issuecomment-1876137904 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache

Re: [I] Hive ping functionality seems to leak threads [iceberg]

2024-01-03 Thread via GitHub
github-actions[bot] closed issue #7034: Hive ping functionality seems to leak threads URL: https://github.com/apache/iceberg/issues/7034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441105763 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -684,6 +715,22 @@ private Set toS3Tags(Map properties, String prefix) {

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441105061 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -50,6 +51,23 @@ public class S3FileIOProperties implements Serializable { */

Re: [I] Snowflake Iceberg Partitioned data read issue [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on issue #9404: URL: https://github.com/apache/iceberg/issues/9404#issuecomment-1876122199 I ultimately recommend continue reaching out to Snowflake on any issues you are encountering on Iceberg integration, but the Spark behavior in the reported issue does seem

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
adnanhemani commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1441103728 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3AccessGrantsPluginConfigurations.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] API: Fix Javadoc on UpdateSchema#updateColumnDoc [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar merged PR #9405: URL: https://github.com/apache/iceberg/pull/9405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] API: Fix Javadoc on UpdateSchema#updateColumnDoc [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar commented on PR #9405: URL: https://github.com/apache/iceberg/pull/9405#issuecomment-1876077427 Thanks @Fokko for the review! Will merge after CI completes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] Build: Bump sqlalchemy from 2.0.24 to 2.0.25 [iceberg-python]

2024-01-03 Thread via GitHub
dependabot[bot] opened a new pull request, #250: URL: https://github.com/apache/iceberg-python/pull/250 Bumps [sqlalchemy](https://github.com/sqlalchemy/sqlalchemy) from 2.0.24 to 2.0.25. Release notes Sourced from https://github.com/sqlalchemy/sqlalchemy/releases;>sqlalchemy's

[PR] API: Fix JavaDoc on UpdateSchema#updateColumnDoc [iceberg]

2024-01-03 Thread via GitHub
amogh-jahagirdar opened a new pull request, #9405: URL: https://github.com/apache/iceberg/pull/9405 This change fixes the JavaDoc on UpdateSchema#updateColumnDoc; previously it was referring to rename (looked to just be a bad copy paste) and now the JavaDoc reflects the actual operation

Re: [PR] Write support [iceberg-python]

2024-01-03 Thread via GitHub
robtandy commented on code in PR #41: URL: https://github.com/apache/iceberg-python/pull/41#discussion_r1440972324 ## pyiceberg/io/pyarrow.py: ## @@ -1565,13 +1564,54 @@ def fill_parquet_file_metadata( del upper_bounds[field_id] del null_value_counts[field_id]

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440937961 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440935178 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440934324 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -125,6 +126,25 @@ public static StructLikeSet toEqualitySet( } } + public static

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1875952717 @singhpk234 @RussellSpitzer @szehon-ho, I rebased this. I addressed most comments, I am working on tests and docs. There are a few open questions too. I'll take a look at them

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440931514 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440930608 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440930608 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440929838 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,59 @@ private static void checkSchemaCompatibility( } } + public static

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440925641 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +796,47 @@ public void applyEndpointConfigurations(T builder) {

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440921673 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440915860 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -684,6 +715,22 @@ private Set toS3Tags(Map properties, String prefix) {

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440915164 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -50,6 +51,23 @@ public class S3FileIOProperties implements Serializable { */

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440914203 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3AccessGrantsPluginConfigurations.java: ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Bug fix falsy value of zero [iceberg-python]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #249: URL: https://github.com/apache/iceberg-python/pull/249#discussion_r1440884348 ## pyiceberg/table/__init__.py: ## @@ -545,7 +545,7 @@ def new_snapshot_id(self) -> int: def current_snapshot(self) -> Optional[Snapshot]: """Get

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440883043 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +795,23 @@ public void applyEndpointConfigurations(T builder) {

Re: [PR] AWS: Add S3 Access Grants Integration [iceberg]

2024-01-03 Thread via GitHub
jackye1995 commented on code in PR #9385: URL: https://github.com/apache/iceberg/pull/9385#discussion_r1440883043 ## aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIOProperties.java: ## @@ -749,4 +795,23 @@ public void applyEndpointConfigurations(T builder) {

Re: [PR] Deliver key metadata for encryption of data files [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on code in PR #9359: URL: https://github.com/apache/iceberg/pull/9359#discussion_r1440814517 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BaseBatchReader.java: ## @@ -53,6 +58,7 @@ abstract class BaseBatchReader extends BaseReader

Re: [PR] Spark: Add actions for disaster recovery. [iceberg]

2024-01-03 Thread via GitHub
flyrain commented on PR #4705: URL: https://github.com/apache/iceberg/pull/4705#issuecomment-1875808007 Hi @laithalzyoud, glad you found this useful. Would you like to take the lead for this task? I could be the co-author if that makes sense to you. I can help on the review, but we will

Re: [PR] Deliver key metadata for encryption of data files [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on code in PR #9359: URL: https://github.com/apache/iceberg/pull/9359#discussion_r1440793961 ## core/src/main/java/org/apache/iceberg/encryption/StandardKeyMetadata.java: ## @@ -31,7 +31,7 @@ import

Re: [PR] API: New API For sequential / streaming updates [iceberg]

2024-01-03 Thread via GitHub
rdblue commented on PR #9323: URL: https://github.com/apache/iceberg/pull/9323#issuecomment-1875787341 @jasonf20, to make that work, I think you'd need to keep track of a base sequence number and update the metadata for each new manifest with the correct sequence number when the manifest

Re: [PR] feat: Introduce basic file scan planning. [iceberg-rust]

2024-01-03 Thread via GitHub
Fokko commented on code in PR #129: URL: https://github.com/apache/iceberg-rust/pull/129#discussion_r1440607629 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -124,6 +150,70 @@ impl Snapshot { Utc.timestamp_millis_opt(self.timestamp_ms).unwrap() } +/// Get

[I] Snowflake Iceberg Partitioned data read issue [iceberg]

2024-01-03 Thread via GitHub
purna344 opened a new issue, #9404: URL: https://github.com/apache/iceberg/issues/9404 ### Feature Request / Improvement We are using Snowflake Iceberg to read the data from the S3 location and that is working fine for the non partitioned data. But If the data is partitioned

Re: [PR] Flink: Backport #9308 to v1.17 and the relevant parts to v1.16 [iceberg]

2024-01-03 Thread via GitHub
pvary commented on PR #9403: URL: https://github.com/apache/iceberg/pull/9403#issuecomment-1875674967 Thanks for the review @stevenzwu! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Flink: Backport #9308 to v1.17 and the relevant parts to v1.16 [iceberg]

2024-01-03 Thread via GitHub
pvary merged PR #9403: URL: https://github.com/apache/iceberg/pull/9403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Flink API rewriteDataFile How to set up scanning based on file size [iceberg]

2024-01-03 Thread via GitHub
pvary commented on issue #9386: URL: https://github.com/apache/iceberg/issues/9386#issuecomment-187561 If a file is bigger than the TARGET_FILE_SIZE, it will create multiple splits when we read it. The last split of the file is a good candidate to add to merge with a new split, so it

Re: [I] When using the Flink upsert mode, the speed of reading data from the iceberg table is very slow. [iceberg]

2024-01-03 Thread via GitHub
pvary commented on issue #9363: URL: https://github.com/apache/iceberg/issues/9363#issuecomment-1875610859 @13535048320: How do you populate the data? Is it a requirement to update the previous records based on the incoming new data, or every record is new? If you have delete files

Re: [I] Spark DataFrame write fails if input dataframe has columns in different order than iceberg schema [iceberg]

2024-01-03 Thread via GitHub
amitmittal5 commented on issue #741: URL: https://github.com/apache/iceberg/issues/741#issuecomment-1875395560 > Hello, is this issue resolved? I am still getting this issue in iceberg 1.4.2 while trying to write in iceberg format to ADLS using spark-streaming. It was actually

Re: [PR] API, Core: Move SQLViewRepresentation to API [iceberg]

2024-01-03 Thread via GitHub
pvary commented on PR #9302: URL: https://github.com/apache/iceberg/pull/9302#issuecomment-1875336457 @nastra: I think we can skip this for now - still think this should be some caching issue on gradle side, which is very hard to repro so not too many people is affected -- This is an

[PR] Flink: Backport #9308 to v1.17 and the relevant parts to v1.16 [iceberg]

2024-01-03 Thread via GitHub
pvary opened a new pull request, #9403: URL: https://github.com/apache/iceberg/pull/9403 Clean backport of #9308 to Flink 1.17 In 1.16, the `pauseOrResumeSplits` is not needed, but backported the other parts, so the code similar between the Flink versions. -- This is an automated

Re: [I] Can iceberg support truncating table? [iceberg]

2024-01-03 Thread via GitHub
jhchee commented on issue #9387: URL: https://github.com/apache/iceberg/issues/9387#issuecomment-1875290566 You could remove table entry from your catalog and create new table within the same directory. This should preserve all your files. -- This is an automated message from the Apache

Re: [PR] Core: Add param to limit manifest parallel reader queue size [iceberg]

2024-01-03 Thread via GitHub
Heltman commented on PR #7844: URL: https://github.com/apache/iceberg/pull/7844#issuecomment-1875283687 > I will add a some change for fix memory leak. And think about creating BlockingParallelIterable instead of change ParallelIterable. I add a new pr just fix memory leak. See

Re: [PR] Fix ParallelIterable memory leak because queue continues to be added even if iterator exited [iceberg]

2024-01-03 Thread via GitHub
Heltman commented on PR #9402: URL: https://github.com/apache/iceberg/pull/9402#issuecomment-1875282246 see #7844 for whole discuss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440382646 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether to

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440372097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440372097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440368146 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440367424 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440366952 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440362963 ## data/src/main/java/org/apache/iceberg/data/DeleteFilter.java: ## @@ -224,14 +223,10 @@ public Predicate eqDeletedRowFilter() { } public

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440336894 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

  1   2   >