Re: [PR] Build: Bump Spark 3.5 to 3.5.2 [iceberg]

2024-08-11 Thread via GitHub
nastra merged PR #10918: URL: https://github.com/apache/iceberg/pull/10918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [I] Table maintenace procedure(expire_snapshots) not work as expceted [iceberg]

2024-08-11 Thread via GitHub
pvary commented on issue #10907: URL: https://github.com/apache/iceberg/issues/10907#issuecomment-2283142853 > `rewrite_manifests` > Unlike data files, `rewrite_manifests` will replace old ones. Actually, this procedure also just creates a new snapshot and keeps the old metada

Re: [I] [feat] Unify implementation of `to_arrow` and `to_arrow_batch_reader` [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on issue #1039: URL: https://github.com/apache/iceberg-python/issues/1039#issuecomment-2282951354 I have not! Please feel free to take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] [feat] Unify implementation of `to_arrow` and `to_arrow_batch_reader` [iceberg-python]

2024-08-11 Thread via GitHub
sungwy commented on issue #1039: URL: https://github.com/apache/iceberg-python/issues/1039#issuecomment-2282945879 Thank you for raising this @kevinjqliu . I think this will be a good improvement to reduce the duplication of code. Have you started working on this already? If not, wou

Re: [I] Missing partition info when committing table to hive through flink [iceberg]

2024-08-11 Thread via GitHub
github-actions[bot] commented on issue #4961: URL: https://github.com/apache/iceberg/issues/4961#issuecomment-2282940259 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [PR] Core: Use Table Partitioning sort with manual Sort Order in Rewrite Data Files [iceberg]

2024-08-11 Thread via GitHub
github-actions[bot] commented on PR #4941: URL: https://github.com/apache/iceberg/pull/4941#issuecomment-2282940239 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [I] [bug] `to_arrow_batch_reader` does not respect the given limit, returning more records than specified [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on issue #1040: URL: https://github.com/apache/iceberg-python/issues/1040#issuecomment-2282922317 Something like this, #1042 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] [bug] `to_arrow_batch_reader` does not respect the given limit, returning more records than specified [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on issue #1040: URL: https://github.com/apache/iceberg-python/issues/1040#issuecomment-2282921664 The code works correctly for 1 data file with a given limit. The bug is when there are 2 data files, which means 2 FileScanTasks. For example, given an iceberg tabl

Re: [I] [bug] `to_arrow_batch_reader` does not respect the given limit, returning more records than specified [iceberg-python]

2024-08-11 Thread via GitHub
sungwy commented on issue #1040: URL: https://github.com/apache/iceberg-python/issues/1040#issuecomment-2282918806 Hi @kevinjqliu thank you for raising this issue. If I understand it correctly, the bug occurs because we are resetting the limit counter, if the limit specified is larger than

[I] Rust <> Python integration point [iceberg-rust]

2024-08-11 Thread via GitHub
kevinjqliu opened a new issue, #538: URL: https://github.com/apache/iceberg-rust/issues/538 After establishing #518, I want to start the conversation to create the first integration between PyIceberg and iceberg-rust. As discussed in the dev list, we want to create an integration based o

Re: [PR] prevent adding duplicate files [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on code in PR #1036: URL: https://github.com/apache/iceberg-python/pull/1036#discussion_r1713025181 ## pyiceberg/table/__init__.py: ## @@ -630,7 +648,15 @@ def add_files(self, file_paths: List[str], snapshot_properties: Dict[str, str] = Raises:

Re: [PR] feat: SQL Catalog - namespaces [iceberg-rust]

2024-08-11 Thread via GitHub
Xuanwo commented on code in PR #534: URL: https://github.com/apache/iceberg-rust/pull/534#discussion_r1713021439 ## crates/catalog/sql/src/catalog.rs: ## @@ -141,21 +142,24 @@ impl SqlCatalog { } /// SQLX Any does not implement PostgresSQL bindings, so we have to do

Re: [I] Fields with mixed datatypes [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on issue #1037: URL: https://github.com/apache/iceberg-python/issues/1037#issuecomment-2282826341 I think generally the columns are strongly typed and won't allow a Union type. https://py.iceberg.apache.org/reference/pyiceberg/types/ Here's the spec's description

Re: [I] Table maintenace procedure(expire_snapshots) not work as expceted [iceberg]

2024-08-11 Thread via GitHub
toien commented on issue #10907: URL: https://github.com/apache/iceberg/issues/10907#issuecomment-2282824993 Snapshots number increased because Flink job still writing data to table. In my opinion, it's better to clerify `retain_last` parameter's "minimum" function in [doc](https://i

Re: [PR] HA HMS support [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on code in PR #752: URL: https://github.com/apache/iceberg-python/pull/752#discussion_r1713023492 ## tests/catalog/test_hive.py: ## @@ -1195,3 +1195,40 @@ def test_hive_wait_for_lock() -> None: with pytest.raises(WaitingForLockException): catal

Re: [PR] HA HMS support [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on code in PR #752: URL: https://github.com/apache/iceberg-python/pull/752#discussion_r1713023562 ## tests/catalog/test_hive.py: ## @@ -1195,3 +1195,40 @@ def test_hive_wait_for_lock() -> None: with pytest.raises(WaitingForLockException): catal

Re: [PR] Support execution in Windows using Local File System and NFS [iceberg-python]

2024-08-11 Thread via GitHub
rfung777 commented on PR #996: URL: https://github.com/apache/iceberg-python/pull/996#issuecomment-2282824642 hi @Fokko , I am not having much luck with setting up the Windows integration tests. It seems to require to install Make and Minio. The changes above seems to be localised in the pa

Re: [PR] HA HMS support [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on code in PR #752: URL: https://github.com/apache/iceberg-python/pull/752#discussion_r1713023492 ## tests/catalog/test_hive.py: ## @@ -1195,3 +1195,40 @@ def test_hive_wait_for_lock() -> None: with pytest.raises(WaitingForLockException): catal

Re: [I] Peformance question for to_arrow, to_pandas, to_duckdb [iceberg-python]

2024-08-11 Thread via GitHub
kevinjqliu commented on issue #1032: URL: https://github.com/apache/iceberg-python/issues/1032#issuecomment-2282819711 Thanks for looking into the different scenarios. It looks like there are varying results depending on the engines. ### Read Path I took a deeper look into the rea

[PR] chore(deps): Update sqlx requirement from 0.7.4 to 0.8.0 [iceberg-rust]

2024-08-11 Thread via GitHub
dependabot[bot] opened a new pull request, #537: URL: https://github.com/apache/iceberg-rust/pull/537 Updates the requirements on [sqlx](https://github.com/launchbadge/sqlx) to permit the latest version. Changelog Sourced from https://github.com/launchbadge/sqlx/blob/main/CHANGELOG

Re: [PR] chore(deps): Bump actions/setup-python from 4 to 5 [iceberg-rust]

2024-08-11 Thread via GitHub
Xuanwo merged PR #536: URL: https://github.com/apache/iceberg-rust/pull/536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.a

[PR] chore(deps): Bump actions/setup-python from 4 to 5 [iceberg-rust]

2024-08-11 Thread via GitHub
dependabot[bot] opened a new pull request, #536: URL: https://github.com/apache/iceberg-rust/pull/536 Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5. Release notes Sourced from https://github.com/actions/setup-python/releases";>actions/setup-pytho

Re: [PR] Spark: Add CopyTable spark action [iceberg]

2024-08-11 Thread via GitHub
laithalzyoud closed pull request #10024: Spark: Add CopyTable spark action URL: https://github.com/apache/iceberg/pull/10024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Spark: Add CopyTable spark action [iceberg]

2024-08-11 Thread via GitHub
laithalzyoud commented on PR #10024: URL: https://github.com/apache/iceberg/pull/10024#issuecomment-2282743507 Hey @huaxingao! I'm planning to continue working on it starting this week. For now I'll close this PR and open a new one to just add the interface, once we agree on the interface,

Re: [PR] Spark: Add CopyTable spark action [iceberg]

2024-08-11 Thread via GitHub
laithalzyoud commented on code in PR #10024: URL: https://github.com/apache/iceberg/pull/10024#discussion_r1712976165 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/BaseCopyTableSparkAction.java: ## @@ -0,0 +1,871 @@ +/* + * Licensed to the Apache Software Fo

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.73.1 to 1.91.0 [iceberg-go]

2024-08-11 Thread via GitHub
dependabot[bot] commented on PR #112: URL: https://github.com/apache/iceberg-go/pull/112#issuecomment-2282676641 Superseded by #121. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.73.1 to 1.91.0 [iceberg-go]

2024-08-11 Thread via GitHub
dependabot[bot] closed pull request #112: build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.73.1 to 1.91.0 URL: https://github.com/apache/iceberg-go/pull/112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[PR] build(deps): bump github.com/aws/aws-sdk-go-v2/service/glue from 1.73.1 to 1.93.0 [iceberg-go]

2024-08-11 Thread via GitHub
dependabot[bot] opened a new pull request, #121: URL: https://github.com/apache/iceberg-go/pull/121 Bumps [github.com/aws/aws-sdk-go-v2/service/glue](https://github.com/aws/aws-sdk-go-v2) from 1.73.1 to 1.93.0. Changelog Sourced from https://github.com/aws/aws-sdk-go-v2/blob/servi

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-08-11 Thread via GitHub
SandeepSinghGahir commented on PR #10433: URL: https://github.com/apache/iceberg/pull/10433#issuecomment-2282666300 When will this merged? I'm getting this issue while reading iceberg tables in glue. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-08-11 Thread via GitHub
SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2282666140 Do we have any solution to this issue? I'm getting this issue while reading iceberg tables in glue. -- This is an automated message from the Apache Git Service. To resp