Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-08-13 Thread via GitHub
nastra commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1716356624 ## core/src/test/java/org/apache/iceberg/TestManifestFileParser.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-08-13 Thread via GitHub
nastra commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1716355581 ## core/src/test/java/org/apache/iceberg/TestManifestFileParser.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mo

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-08-13 Thread via GitHub
nastra commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1716350419 ## .palantir/revapi.yml: ## @@ -874,6 +874,10 @@ acceptedBreaks: justification: "Static utility class - should not have public constructor" "1.4.0": or

Re: [PR] Core: create a default Hadoop config if not provided in constructor [iceberg]

2024-08-13 Thread via GitHub
nastra commented on code in PR #10926: URL: https://github.com/apache/iceberg/pull/10926#discussion_r1716349267 ## core/src/main/java/org/apache/iceberg/hadoop/HadoopFileIO.java: ## @@ -63,7 +63,11 @@ public class HadoopFileIO implements HadoopConfigurable, DelegateFileIO {

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo closed issue #131: field-id in avro schema is missing URL: https://github.com/apache/iceberg-rust/issues/131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-08-13 Thread via GitHub
ZENOTME commented on issue #131: URL: https://github.com/apache/iceberg-rust/issues/131#issuecomment-2287857377 > This one can be closed? I think we can close it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] Tracking issues of iceberg rust v0.3.0 Release [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo opened a new issue, #543: URL: https://github.com/apache/iceberg-rust/issues/543 This issue is used to track tasks of the iceberg rust v0.3.0 release. ## Tasks ### Blockers - [ ] https://github.com/apache/iceberg-rust/issues/131 - [ ] https://github.com/apache/i

Re: [PR] Flink: infer source parallelism for FLIP-27 source in batch execution mode [iceberg]

2024-08-13 Thread via GitHub
stevenzwu commented on code in PR #10832: URL: https://github.com/apache/iceberg/pull/10832#discussion_r1716294343 ## flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/source/IcebergSource.java: ## @@ -205,12 +225,35 @@ private SplitEnumerator createEnumer // Onl

Re: [I] Tracking issues of iceberg-rust v0.3.0 [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo commented on issue #348: URL: https://github.com/apache/iceberg-rust/issues/348#issuecomment-2287849218 Hi, most of the issues in our [0.3 milestone](https://github.com/apache/iceberg-rust/milestone/2) have been closed. I plan to clean up the remaining issues and initiate the release

Re: [I] doc: Update example and doc to show table scan api. [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo commented on issue #415: URL: https://github.com/apache/iceberg-rust/issues/415#issuecomment-2287847733 Let me take this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] `field-id`'s missing in generated Avro files [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo commented on issue #353: URL: https://github.com/apache/iceberg-rust/issues/353#issuecomment-2287846778 Hi, I beleive this can be closed not? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] doc: Update README to reflect latest status update of components. [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo commented on issue #416: URL: https://github.com/apache/iceberg-rust/issues/416#issuecomment-2287847133 Let me take this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] field-id in avro schema is missing [iceberg-rust]

2024-08-13 Thread via GitHub
Xuanwo commented on issue #131: URL: https://github.com/apache/iceberg-rust/issues/131#issuecomment-2287846440 This one can be closed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Spark: Add RewriteTableLocation action interface [iceberg]

2024-08-13 Thread via GitHub
flyrain commented on code in PR #10920: URL: https://github.com/apache/iceberg/pull/10920#discussion_r1716296977 ## api/src/main/java/org/apache/iceberg/actions/RewriteTableLocation.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [I] Review new ImmutablesReferenceEquality error-prone check [iceberg]

2024-08-13 Thread via GitHub
danielhumanmod commented on issue #10855: URL: https://github.com/apache/iceberg/issues/10855#issuecomment-2287815013 Hi team, may I take a look on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] API, AWS: Add RetryableInputStream and use that in S3InputStream [iceberg]

2024-08-13 Thread via GitHub
amogh-jahagirdar commented on code in PR #10433: URL: https://github.com/apache/iceberg/pull/10433#discussion_r1716246358 ## api/src/main/java/org/apache/iceberg/io/RetryableInputStream.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Spark: Add RewriteTableLocation action interface [iceberg]

2024-08-13 Thread via GitHub
manuzhang commented on code in PR #10920: URL: https://github.com/apache/iceberg/pull/10920#discussion_r1716227719 ## api/src/main/java/org/apache/iceberg/actions/RewriteTableLocation.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716215131 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,192 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_p

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716223848 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Kevinjqliu/1055 [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on PR #1060: URL: https://github.com/apache/iceberg-python/pull/1060#issuecomment-2287681677 Testing CI for #1055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Kevinjqliu/1055 [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu closed pull request #1060: Kevinjqliu/1055 URL: https://github.com/apache/iceberg-python/pull/1060 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Coverage Run unit tests first before docker containers are set up [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu merged PR #1055: URL: https://github.com/apache/iceberg-python/pull/1055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@i

Re: [I] [feat] Run unit test in CI on pull request [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu closed issue #1051: [feat] Run unit test in CI on pull request URL: https://github.com/apache/iceberg-python/issues/1051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Fix main branch building break [iceberg-rust]

2024-08-13 Thread via GitHub
liurenjie1024 commented on PR #541: URL: https://github.com/apache/iceberg-rust/pull/541#issuecomment-2287670546 > @liurenjie1024 Instead of using a merge queue, the option that requires the PR to be rebased before merging could be enabled. > > https://private-user-images.githubuserco

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716165411 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Coverage Run unit tests first before docker containers are set up [iceberg-python]

2024-08-13 Thread via GitHub
Minfante377 commented on code in PR #1055: URL: https://github.com/apache/iceberg-python/pull/1055#discussion_r1716156779 ## Makefile: ## @@ -67,7 +70,10 @@ test-coverage: sleep 10 docker compose -f dev/docker-compose-integration.yml cp ./dev/provision.py spark-

Re: [PR] Coverage Run unit tests first before docker containers are set up [iceberg-python]

2024-08-13 Thread via GitHub
Minfante377 commented on code in PR #1055: URL: https://github.com/apache/iceberg-python/pull/1055#discussion_r1716157215 ## tests/integration/test_reads.py: ## @@ -753,6 +753,7 @@ def test_configure_row_group_batch_size(session_catalog: Catalog) -> None: assert len(batche

Re: [PR] [Reference PR] [API + Avro] Add default value APIs and Avro implementation [iceberg]

2024-08-13 Thread via GitHub
wmoustafa commented on code in PR #9502: URL: https://github.com/apache/iceberg/pull/9502#discussion_r1716157111 ## core/src/test/java/org/apache/iceberg/avro/TestReadDefaultValues.java: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716153306 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_p

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716153306 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_p

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716152079 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_p

Re: [PR] Coverage Run unit tests first before docker containers are set up [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1055: URL: https://github.com/apache/iceberg-python/pull/1055#discussion_r1716150796 ## tests/integration/test_reads.py: ## @@ -753,6 +753,7 @@ def test_configure_row_group_batch_size(session_catalog: Catalog) -> None: assert len(batches

Re: [I] [feat] Run unit test in CI on pull request [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on issue #1051: URL: https://github.com/apache/iceberg-python/issues/1051#issuecomment-2287528383 Thanks everyone for coming up with this solution, love the team work!! -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] [feat] Run unit test in CI on pull request [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on issue #1051: URL: https://github.com/apache/iceberg-python/issues/1051#issuecomment-2287527950 > I think this change will result in make test being run twice in both make test-coverage, and in make test for all python versions. I'm wondering if there's a better way

Re: [PR] [Reference PR] [API + Avro] Add default value APIs and Avro implementation [iceberg]

2024-08-13 Thread via GitHub
wmoustafa commented on code in PR #9502: URL: https://github.com/apache/iceberg/pull/9502#discussion_r1716149183 ## core/src/main/java/org/apache/iceberg/avro/GenericAvroReader.java: ## @@ -155,6 +162,41 @@ public ValueReader record(Type partner, Schema record, List> f r

Re: [PR] Optimize reads of record batches by pushing limit to file level [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1057: URL: https://github.com/apache/iceberg-python/pull/1057#discussion_r1716133239 ## pyiceberg/io/pyarrow.py: ## @@ -1194,6 +1194,7 @@ def _task_to_record_batches( case_sensitive: bool, name_mapping: Optional[NameMapping] = None,

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716137872 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716137872 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716137061 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716135524 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716135206 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_per_f

Re: [PR] Support timestamp in partition path [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] closed pull request #3933: Support timestamp in partition path URL: https://github.com/apache/iceberg/pull/3933 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Hive: Support 'identifier-field-ids' when creating table in hive [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] closed pull request #3912: Hive: Support 'identifier-field-ids' when creating table in hive URL: https://github.com/apache/iceberg/pull/3912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Support timestamp in partition path [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] commented on PR #3933: URL: https://github.com/apache/iceberg/pull/3933#issuecomment-2287467409 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Test: Add unit tests to validate forTable calls setAll with table properties [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] closed pull request #3902: Test: Add unit tests to validate forTable calls setAll with table properties URL: https://github.com/apache/iceberg/pull/3902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Test: Add unit tests to validate forTable calls setAll with table properties [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] commented on PR #3902: URL: https://github.com/apache/iceberg/pull/3902#issuecomment-2287467356 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Core: Reading manifetsFiles parallel with ManifestGroup#planFiles [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] closed pull request #3742: Core: Reading manifetsFiles parallel with ManifestGroup#planFiles URL: https://github.com/apache/iceberg/pull/3742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] closed pull request #3701: [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS URL: https://github.com/apache/iceberg/pull/3701 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] AWS: Add unit tests for GlueCatalog's isValidIdentifier method [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] commented on PR #3698: URL: https://github.com/apache/iceberg/pull/3698#issuecomment-2287467271 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Core, Spark: Remove dangling deletes as part of RewriteDataFilesAction [iceberg]

2024-08-13 Thread via GitHub
dramaticlly commented on code in PR #9724: URL: https://github.com/apache/iceberg/pull/9724#discussion_r1716093612 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/RemoveDanglingDeletesSparkAction.java: ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Softwa

Re: [PR] Hive: Support 'identifier-field-ids' when creating table in hive [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] commented on PR #3912: URL: https://github.com/apache/iceberg/pull/3912#issuecomment-2287467385 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] [SPARK] Make drop namespaces call respect CASCADE and IF EXISTS [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] commented on PR #3701: URL: https://github.com/apache/iceberg/pull/3701#issuecomment-2287467302 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Core: Reading manifetsFiles parallel with ManifestGroup#planFiles [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] commented on PR #3742: URL: https://github.com/apache/iceberg/pull/3742#issuecomment-2287467327 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] AWS: Add unit tests for GlueCatalog's isValidIdentifier method [iceberg]

2024-08-13 Thread via GitHub
github-actions[bot] closed pull request #3698: AWS: Add unit tests for GlueCatalog's isValidIdentifier method URL: https://github.com/apache/iceberg/pull/3698 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Refactor PyArrow DataFiles Projection functions [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on code in PR #1043: URL: https://github.com/apache/iceberg-python/pull/1043#discussion_r1716128868 ## pyiceberg/io/pyarrow.py: ## @@ -1308,6 +1309,138 @@ def _read_all_delete_files(fs: FileSystem, tasks: Iterable[FileScanTask]) -> Dic return deletes_p

Re: [PR] Bump pyspark from 3.5.1 to 3.5.2 [iceberg-python]

2024-08-13 Thread via GitHub
sungwy merged PR #1048: URL: https://github.com/apache/iceberg-python/pull/1048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [PR] Bump deptry from 0.19.0 to 0.19.1 [iceberg-python]

2024-08-13 Thread via GitHub
sungwy merged PR #1047: URL: https://github.com/apache/iceberg-python/pull/1047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@icebe

Re: [I] DOCS: Improve Documentation on Write Support [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on issue #1008: URL: https://github.com/apache/iceberg-python/issues/1008#issuecomment-2287415541 Examples of `overwrite_filter` * https://github.com/apache/iceberg-python/issues/402#issuecomment-2271507538 * https://github.com/apache/iceberg-python/issues/1020#is

Re: [PR] Optimize reads of record batches by pushing limit to file level [iceberg-python]

2024-08-13 Thread via GitHub
sungwy commented on PR #1057: URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2287395213 Hi @soumya-ghosh - thank you for picking this issue up! I'm working on refactoring this part of the code base, and I have a different, but similar approach for pushing the limit dow

Re: [PR] AWS, Core, Hive: Extract FileIO tracker/closer into separate class [iceberg]

2024-08-13 Thread via GitHub
amogh-jahagirdar merged PR #10893: URL: https://github.com/apache/iceberg/pull/10893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@

Re: [PR] AWS, Core, Hive: Extract FileIO tracker/closer into separate class [iceberg]

2024-08-13 Thread via GitHub
amogh-jahagirdar commented on PR #10893: URL: https://github.com/apache/iceberg/pull/10893#issuecomment-2287381972 This is great @nastra , removes quite a bit of duplication! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] Bump mypy-boto3-glue from 1.34.157 to 1.34.160 [iceberg-python]

2024-08-13 Thread via GitHub
dependabot[bot] opened a new pull request, #1059: URL: https://github.com/apache/iceberg-python/pull/1059 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.34.157 to 1.34.160. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/com

[PR] [DRAFT] Support changelog scan for table with delete files [iceberg]

2024-08-13 Thread via GitHub
wypoon opened a new pull request, #10935: URL: https://github.com/apache/iceberg/pull/10935 Currently changelog scan is only supported for a table with no delete files. We implement support for the case when delete files are present in the snapshots to be scanned. -- This is an automated

Re: [I] Peformance question for to_arrow, to_pandas, to_duckdb [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu closed issue #1032: Peformance question for to_arrow, to_pandas, to_duckdb URL: https://github.com/apache/iceberg-python/issues/1032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Peformance question for to_arrow, to_pandas, to_duckdb [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on issue #1032: URL: https://github.com/apache/iceberg-python/issues/1032#issuecomment-2287282201 Thanks for reporting this. I learned a lot from exploring this thread, and we have some solid improvements coming up. Please let us know if anything else comes up! -- T

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-08-13 Thread via GitHub
SandeepSinghGahir commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2287275257 Thanks for the suggestion, I will try them out. However, there is a pull request open already. Also, @danielcweeks mentioned here -> https://github.com/apache/iceberg/pul

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-08-13 Thread via GitHub
stevenzwu commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1716011494 ## core/src/main/java/org/apache/iceberg/GenericManifestFile.java: ## @@ -105,6 +105,42 @@ public GenericManifestFile(Schema avroSchema) { this.keyMetadata = n

Re: [PR] Optimize reads of record batches by pushing limit to file level [iceberg-python]

2024-08-13 Thread via GitHub
soumya-ghosh commented on PR #1057: URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2287206429 @kevinjqliu any thoughts on this implementation? Is this what you had in mind? I have tested in a file of approx 50 MB and verified that fewer batches are scanned in this a

Re: [PR] Core, Spark: Remove dangling deletes as part of RewriteDataFilesAction [iceberg]

2024-08-13 Thread via GitHub
szehon-ho commented on code in PR #9724: URL: https://github.com/apache/iceberg/pull/9724#discussion_r1715753937 ## api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java: ## @@ -106,6 +106,19 @@ public interface RewriteDataFiles boolean USE_STARTING_SEQUENCE_NU

[I] Iceberg documented example data for CDC view is wrong [iceberg]

2024-08-13 Thread via GitHub
rchui opened a new issue, #10934: URL: https://github.com/apache/iceberg/issues/10934 ### Apache Iceberg version 1.6.0 (latest release) ### Query engine None ### Please describe the bug 🐞 When using CDC views the documentation states that the three additiona

Re: [I] [feat] add missing metadata tables [iceberg-python]

2024-08-13 Thread via GitHub
kevinjqliu commented on issue #1053: URL: https://github.com/apache/iceberg-python/issues/1053#issuecomment-2287150622 That makes sense to me, thanks @soumya-ghosh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Core: API to add manifest file with OverwriteFiles. [iceberg]

2024-08-13 Thread via GitHub
RussellSpitzer commented on PR #9822: URL: https://github.com/apache/iceberg/pull/9822#issuecomment-2287055070 I agree with @Fokko here, I'm a little worried about adding manifests directly since those are strictly tied to format versions and adding an incorrectly written manifest would bre

Re: [PR] Spark Action to Analyze table [iceberg]

2024-08-13 Thread via GitHub
guykhazma commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1715785304 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Spark Action to Analyze table [iceberg]

2024-08-13 Thread via GitHub
guykhazma commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1715785304 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Add REST Compatibility Kit [iceberg]

2024-08-13 Thread via GitHub
dimas-b commented on code in PR #10908: URL: https://github.com/apache/iceberg/pull/10908#discussion_r1715848572 ## open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add REST Compatibility Kit [iceberg]

2024-08-13 Thread via GitHub
dimas-b commented on code in PR #10908: URL: https://github.com/apache/iceberg/pull/10908#discussion_r1715847270 ## open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] [feat] add missing metadata tables [iceberg-python]

2024-08-13 Thread via GitHub
soumya-ghosh commented on issue #1053: URL: https://github.com/apache/iceberg-python/issues/1053#issuecomment-2287010042 @kevinjqliu we can group the tasks in following way: * `data_files` and `delete_files` - they are subsets of `files`, just a filter condition on content field, hence c

Re: [PR] Add REST Compatibility Kit [iceberg]

2024-08-13 Thread via GitHub
danielcweeks commented on code in PR #10908: URL: https://github.com/apache/iceberg/pull/10908#discussion_r1715840694 ## open-api/src/test/java/org/apache/iceberg/rest/RESTCompatibilityKitCatalogTests.java: ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (

Re: [PR] Ensure that RestCatalog passes user config to FileIO [iceberg-rust]

2024-08-13 Thread via GitHub
sdd commented on code in PR #476: URL: https://github.com/apache/iceberg-rust/pull/476#discussion_r1715818692 ## crates/catalog/rest/src/catalog.rs: ## @@ -504,8 +504,15 @@ impl Catalog for RestCatalog { .query::(request) .await?; +let config

Re: [PR] Object Cache: caches parsed Manifests and ManifestLists for performance [iceberg-rust]

2024-08-13 Thread via GitHub
sdd commented on code in PR #512: URL: https://github.com/apache/iceberg-rust/pull/512#discussion_r1715808031 ## crates/iceberg/src/spec/manifest.rs: ## @@ -94,6 +94,12 @@ impl Manifest { &self.entries } +/// Consume this Manifest, returning its constituent p

Re: [PR] Table Scan Performance Tests [iceberg-rust]

2024-08-13 Thread via GitHub
sdd commented on PR #497: URL: https://github.com/apache/iceberg-rust/pull/497#issuecomment-2286965305 @Xuanwo and @liurenjie1024: This is now passing and ready for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Spark Action to Analyze table [iceberg]

2024-08-13 Thread via GitHub
guykhazma commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1715785304 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] Concurrent data file fetching and parallel RecordBatch processing [iceberg-rust]

2024-08-13 Thread via GitHub
sdd commented on code in PR #515: URL: https://github.com/apache/iceberg-rust/pull/515#discussion_r1715801644 ## crates/iceberg/src/arrow/reader.rs: ## @@ -44,25 +43,39 @@ use crate::error::Result; use crate::expr::visitors::bound_predicate_visitor::{visit, BoundPredicateVisit

Re: [I] javax.net.ssl.SSLException: Connection reset on S3 w/ S3FileIO and Apache HTTP client [iceberg]

2024-08-13 Thread via GitHub
steveloughran commented on issue #10340: URL: https://github.com/apache/iceberg/issues/10340#issuecomment-2286940103 one of those stack traces is from deltaio, so nothing to do with iceberg both of them are caused by the AWS sdk itself not retrying, or retrying but not enough times fo

Re: [PR] Spark Action to Analyze table [iceberg]

2024-08-13 Thread via GitHub
guykhazma commented on code in PR #10288: URL: https://github.com/apache/iceberg/pull/10288#discussion_r1715785304 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/AnalyzeTableSparkAction.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Founda

Re: [PR] API: implement types timestamp_ns and timestamptz_ns [iceberg]

2024-08-13 Thread via GitHub
jacobmarble commented on PR #9008: URL: https://github.com/apache/iceberg/pull/9008#issuecomment-2286916696 @rdblue friendly ping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] Is dataFiles() Method Retryable? [iceberg]

2024-08-13 Thread via GitHub
steveloughran commented on issue #10750: URL: https://github.com/apache/iceberg/issues/10750#issuecomment-2286902949 Is this an AWS s3 store? I don't see the extended request IDs in the stack trace you get from there... -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Add REST Compatibility Kit [iceberg]

2024-08-13 Thread via GitHub
danielcweeks commented on code in PR #10908: URL: https://github.com/apache/iceberg/pull/10908#discussion_r1715755462 ## open-api/src/testFixtures/java/org/apache/iceberg/rest/RCKUtils.java: ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Core: Drop support for Java 8 [iceberg]

2024-08-13 Thread via GitHub
steveloughran commented on PR #10518: URL: https://github.com/apache/iceberg/pull/10518#issuecomment-2286884303 @Fokko started that PR. #10932 ...looks like hive2 is dead too. This good in that it will significantly reduce test profiles -but it is going to make the PR at lot more complex. I

Re: [PR] Add adaptive split size [iceberg]

2024-08-13 Thread via GitHub
danielcweeks closed pull request #7688: Add adaptive split size URL: https://github.com/apache/iceberg/pull/7688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [DRAFT] Build: remove hadoop 2 support [iceberg]

2024-08-13 Thread via GitHub
steveloughran commented on PR #10932: URL: https://github.com/apache/iceberg/pull/10932#issuecomment-2286877905 hive2 tests are failing with what looks like guava version/access issues. ``` TestHiveIcebergStorageHandlerWithEngine > testDescribeTable() > fileFormat=PARQUET, engine=mr, c

Re: [PR] Core: (unit test) Set partition to the right PartitionKey [iceberg]

2024-08-13 Thread via GitHub
szehon-ho commented on PR #10925: URL: https://github.com/apache/iceberg/pull/10925#issuecomment-2286874881 Thanks @hsiang-c ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Core: (unit test) Set partition to the right PartitionKey [iceberg]

2024-08-13 Thread via GitHub
szehon-ho merged PR #10925: URL: https://github.com/apache/iceberg/pull/10925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Spark: Add RewriteTableLocation action interface [iceberg]

2024-08-13 Thread via GitHub
flyrain commented on code in PR #10920: URL: https://github.com/apache/iceberg/pull/10920#discussion_r1715740325 ## api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java: ## @@ -70,4 +70,10 @@ default RewritePositionDeleteFiles rewritePositionDeletes(Table table) {

Re: [PR] feat: SQL Catalog - namespaces [iceberg-rust]

2024-08-13 Thread via GitHub
callum-ryan commented on code in PR #534: URL: https://github.com/apache/iceberg-rust/pull/534#discussion_r1715744495 ## crates/catalog/sql/src/catalog.rs: ## @@ -167,43 +177,344 @@ impl SqlCatalog { .await .map_err(from_sqlx_error) } + +/// Ex

Re: [PR] feat: SQL Catalog - namespaces [iceberg-rust]

2024-08-13 Thread via GitHub
callum-ryan commented on code in PR #534: URL: https://github.com/apache/iceberg-rust/pull/534#discussion_r1715743769 ## crates/catalog/sql/src/catalog.rs: ## @@ -167,43 +177,344 @@ impl SqlCatalog { .await .map_err(from_sqlx_error) } + +/// Ex

Re: [PR] Core: add JSON serialization for BaseFilesTable.ManifestReadTask, AllManifestsTable.ManifestListReadTask, and BaseEntriesTable.ManifestReadTask [iceberg]

2024-08-13 Thread via GitHub
stevenzwu commented on code in PR #10735: URL: https://github.com/apache/iceberg/pull/10735#discussion_r1715741772 ## core/src/main/java/org/apache/iceberg/AllManifestsTableTaskParser.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one +

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-13 Thread via GitHub
steveloughran commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1715740496 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -330,11 +345,18 @@ private Dataset listedFileDS() {

Re: [I] [feat] Run unit test in CI on pull request [iceberg-python]

2024-08-13 Thread via GitHub
youssef-itanii commented on issue #1051: URL: https://github.com/apache/iceberg-python/issues/1051#issuecomment-2286857469 > Hi @youssef-itanii and @Minfante377 - thank you again for working on these PRs, and sorry for this unfortunate concurrency conflict. I think we could avoid a concurr

Re: [I] [feat] Run unit test in CI on pull request [iceberg-python]

2024-08-13 Thread via GitHub
Minfante377 commented on issue #1051: URL: https://github.com/apache/iceberg-python/issues/1051#issuecomment-2286853368 Thank you @sungwy and @youssef-itanii . I'll remember to self assign the issue to me if I start working for a solution for an issue in the future. -- This is an automat

  1   2   3   >