Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-10-24 Thread via GitHub
HonahX commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r137662 ## pyiceberg/table/snapshots.py: ## @@ -116,3 +144,199 @@ class MetadataLogEntry(IcebergBaseModel): class SnapshotLogEntry(IcebergBaseModel): snapshot_id: int =

Re: [PR] Add Snapshot logic and Summary generation [iceberg-python]

2023-10-24 Thread via GitHub
HonahX commented on code in PR #61: URL: https://github.com/apache/iceberg-python/pull/61#discussion_r1369660433 ## pyiceberg/table/snapshots.py: ## @@ -116,3 +144,199 @@ class MetadataLogEntry(IcebergBaseModel): class SnapshotLogEntry(IcebergBaseModel): snapshot_id: int =

[PR] Add docs dir branch [iceberg]

2023-10-24 Thread via GitHub
bitsondatadev opened a new pull request, #8917: URL: https://github.com/apache/iceberg/pull/8917 Adding 1.4.0/1.4.1 versioned MkDocs builds ([they both use the same version](https://github.com/apache/iceberg-docs/tree/asf-site/docs)) to an orphaned `docs` branch in the main repository.

[I] Flink: Not Writing [iceberg]

2023-10-24 Thread via GitHub
a8356555 opened a new issue, #8916: URL: https://github.com/apache/iceberg/issues/8916 ### Apache Iceberg version 1.4.0 ### Query engine Flink ### Please describe the bug šŸž I'm using following Dockerfile as my environment: ```Dockerfile FROM alpine:3.1

Re: [PR] Open-API: Error response in the spec donā€™t align with the expected model. [iceberg]

2023-10-24 Thread via GitHub
jackye1995 commented on PR #8914: URL: https://github.com/apache/iceberg/pull/8914#issuecomment-1778282237 cc @amogh-jahagirdar @nastra seems like a miss in the spec, unless the OpenAPI error response requires the `error` nesting by default, but I did not find any reference of that. -- T

Re: [PR] Open-API: Error response in the spec donā€™t align with the expected model. [iceberg]

2023-10-24 Thread via GitHub
jackye1995 commented on code in PR #8914: URL: https://github.com/apache/iceberg/pull/8914#discussion_r1370970627 ## open-api/rest-catalog-open-api.yaml: ## @@ -2508,19 +2517,6 @@ components: } } -IcebergErrorResponse: Review Comment: I think th

Re: [I] How to improve performance of RewriteManifests procedure? [iceberg]

2023-10-24 Thread via GitHub
github-actions[bot] closed issue #7325: How to improve performance of RewriteManifests procedure? URL: https://github.com/apache/iceberg/issues/7325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] How to improve performance of RewriteManifests procedure? [iceberg]

2023-10-24 Thread via GitHub
github-actions[bot] commented on issue #7325: URL: https://github.com/apache/iceberg/issues/7325#issuecomment-1778260143 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Improve error reporting when streaming snapshot ID is no longer available [iceberg]

2023-10-24 Thread via GitHub
github-actions[bot] closed issue #7340: Improve error reporting when streaming snapshot ID is no longer available URL: https://github.com/apache/iceberg/issues/7340 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Improve error reporting when streaming snapshot ID is no longer available [iceberg]

2023-10-24 Thread via GitHub
github-actions[bot] commented on issue #7340: URL: https://github.com/apache/iceberg/issues/7340#issuecomment-1778260107 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Move iceberg table data from one bucket to another using spark [iceberg]

2023-10-24 Thread via GitHub
github-actions[bot] commented on issue #7446: URL: https://github.com/apache/iceberg/issues/7446#issuecomment-1778260018 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] Prohibit rewrites of equality deletes across sequence numbers [iceberg]

2023-10-24 Thread via GitHub
github-actions[bot] commented on issue #7452: URL: https://github.com/apache/iceberg/issues/7452#issuecomment-1778259989 This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs.

Re: [I] java.lang.IllegalArgumentException: requirement failed while read migrated parquet table [iceberg]

2023-10-24 Thread via GitHub
Omega359 commented on issue #8863: URL: https://github.com/apache/iceberg/issues/8863#issuecomment-1778246891 I've just encountered this exception as well but the circumstances are somewhat different. One process is writing out an iceberg table primarily via appends with the occasional dele

Re: [PR] Open-API: Error response in the spec donā€™t align with the expected model. [iceberg]

2023-10-24 Thread via GitHub
geruh commented on code in PR #8914: URL: https://github.com/apache/iceberg/pull/8914#discussion_r1370927351 ## open-api/rest-catalog-open-api.yaml: ## @@ -2508,19 +2517,6 @@ components: } } -IcebergErrorResponse: Review Comment: Didn't mean to

Re: [PR] Open-API: Error response in the spec donā€™t align with the expected model. [iceberg]

2023-10-24 Thread via GitHub
geruh commented on code in PR #8914: URL: https://github.com/apache/iceberg/pull/8914#discussion_r1370927351 ## open-api/rest-catalog-open-api.yaml: ## @@ -2508,19 +2517,6 @@ components: } } -IcebergErrorResponse: Review Comment: Didn't mean to

Re: [PR] Open-API: Error response in the spec donā€™t align with the expected model. [iceberg]

2023-10-24 Thread via GitHub
geruh commented on code in PR #8914: URL: https://github.com/apache/iceberg/pull/8914#discussion_r1370927351 ## open-api/rest-catalog-open-api.yaml: ## @@ -2508,19 +2517,6 @@ components: } } -IcebergErrorResponse: Review Comment: Didn't mean to

Re: [PR] Spec: add nanosecond timestamp types [iceberg]

2023-10-24 Thread via GitHub
jacobmarble commented on PR #8683: URL: https://github.com/apache/iceberg/pull/8683#issuecomment-1777840610 @rdblue @Fokko what concerns remain regarding this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Adding new columns (mergeSchema) [iceberg]

2023-10-24 Thread via GitHub
RussellSpitzer commented on issue #8908: URL: https://github.com/apache/iceberg/issues/8908#issuecomment-1777826448 https://iceberg.apache.org/docs/latest/spark-writes/#schema-merge Please make sure you are setting all the appropriate properties -- This is an automated message from the

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1370653922 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) { }

Re: [I] Schema issue between Arrow and PyIceberg [iceberg]

2023-10-24 Thread via GitHub
Fokko commented on issue #8913: URL: https://github.com/apache/iceberg/issues/8913#issuecomment-164930 Thanks @asheeshgarg for raising this. The PyIceberg repository has been moved to https://github.com/apache/iceberg-python. To get the quickest answers, it is best to raise the question

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
dimas-b commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1370594082 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) {

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1370572856 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) { }

Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

2023-10-24 Thread via GitHub
stevenzwu commented on code in PR #7798: URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370531354 ## api/src/main/java/org/apache/iceberg/SortKey.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor l

Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

2023-10-24 Thread via GitHub
stevenzwu commented on code in PR #7798: URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370524541 ## api/src/main/java/org/apache/iceberg/StructTransform.java: ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

Re: [I] operations fail after upgrading to spark 3.4 [iceberg]

2023-10-24 Thread via GitHub
huaxingao commented on issue #8904: URL: https://github.com/apache/iceberg/issues/8904#issuecomment-1777589163 cc @aokolnychyi @RussellSpitzer Seems we need to turn this `useCommitCoordinator` off. Here is the [discussion](https://github.com/apache/spark/pull/36564#issuecomment-17742

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
stevenzwu commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1370442734 ## api/src/main/java/org/apache/iceberg/ContentFile.java: ## @@ -165,6 +166,20 @@ default Long fileSequenceNumber() { */ F copyWithoutStats(); + /** + * C

Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

2023-10-24 Thread via GitHub
RussellSpitzer commented on code in PR #7798: URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370461928 ## api/src/main/java/org/apache/iceberg/StructTransform.java: ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

[I] Schema issue between Arrow and PyIceberg [iceberg]

2023-10-24 Thread via GitHub
asheeshgarg opened a new issue, #8913: URL: https://github.com/apache/iceberg/issues/8913 ### Apache Iceberg version 1.4.1 (latest release) ### Query engine Other ### Please describe the bug šŸž @Fokko we have a table in iceberg which has some of the column na

Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

2023-10-24 Thread via GitHub
RussellSpitzer commented on code in PR #7798: URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370454116 ## api/src/main/java/org/apache/iceberg/StructTransform.java: ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

Re: [PR] API: add StructTransform base class for PartitionKey and SortKey. add SortOrderComparators [iceberg]

2023-10-24 Thread via GitHub
RussellSpitzer commented on code in PR #7798: URL: https://github.com/apache/iceberg/pull/7798#discussion_r1370435653 ## api/src/main/java/org/apache/iceberg/SortKey.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contribu

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
dimas-b commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1370419080 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) {

Re: [I] Exception occurred while writing to Iceberg tables by 'INSERT OVERWRITE' [iceberg]

2023-10-24 Thread via GitHub
sanromeo commented on issue #5384: URL: https://github.com/apache/iceberg/issues/5384#issuecomment-1777461016 Have same issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
dimas-b commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1370336084 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) {

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
dimas-b commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1370331050 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) {

Re: [PR] Add Refurb to ruff [iceberg-python]

2023-10-24 Thread via GitHub
Fokko commented on PR #87: URL: https://github.com/apache/iceberg-python/pull/87#issuecomment-1777259485 @jayceslesar Thanks, there is no rush, appreciate it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add Refurb to ruff [iceberg-python]

2023-10-24 Thread via GitHub
jayceslesar commented on PR #87: URL: https://github.com/apache/iceberg-python/pull/87#issuecomment-1777174485 @Fokko I am definitely interested in adding it directly! Can look into this next week -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Build: Replace Thread.Sleep() usage with org.Awaitility from Tests. [iceberg]

2023-10-24 Thread via GitHub
nk1506 commented on code in PR #8804: URL: https://github.com/apache/iceberg/pull/8804#discussion_r1370122449 ## core/src/test/java/org/apache/iceberg/hadoop/TestHadoopCommits.java: ## @@ -435,13 +437,11 @@ public void testConcurrentFastAppends(@TempDir File dir) throws Excepti

Re: [PR] Fix literal predicate equality check [iceberg-python]

2023-10-24 Thread via GitHub
Fokko merged PR #94: URL: https://github.com/apache/iceberg-python/pull/94 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.ap

Re: [PR] Fix literal predicate equality check [iceberg-python]

2023-10-24 Thread via GitHub
Fokko commented on code in PR #94: URL: https://github.com/apache/iceberg-python/pull/94#discussion_r1370091939 ## pyiceberg/expressions/__init__.py: ## @@ -701,7 +701,7 @@ def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundLiteralPredi def __eq__(self,

[PR] Add javadoc dir branch [iceberg]

2023-10-24 Thread via GitHub
bitsondatadev opened a new pull request, #8912: URL: https://github.com/apache/iceberg/pull/8912 Adding the existing [static javadoc sites](https://github.com/apache/iceberg-docs/tree/asf-site/javadoc) to a separate branch in the main repository. This will enable the [step to add the

Re: [I] Add View Support to Spark [iceberg]

2023-10-24 Thread via GitHub
nastra commented on issue #7938: URL: https://github.com/apache/iceberg/issues/7938#issuecomment-1777000677 @singhpk234 I was planning to pick up https://github.com/apache/spark/pull/39796, but I don't know yet whether we'd want to integrate those temporarily into Iceberg until they make it

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1369938013 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -540,4 +604,35 @@ public void close() { api.close(); } } + + private vo

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-10-24 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1369932165 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -181,133 +185,223 @@ public IcebergTable table(TableIdentifier tableIdentifier) { }

Re: [I] Hive's performance for querying the Iceberg table is very poor. [iceberg]

2023-10-24 Thread via GitHub
pvary commented on issue #8901: URL: https://github.com/apache/iceberg/issues/8901#issuecomment-1776773140 You might want to try Hive 4.0.0-beta-1 which has plenty of related performance improvements -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Apache hive 3 with Tez engine select table no empty [iceberg]

2023-10-24 Thread via GitHub
pvary commented on issue #8891: URL: https://github.com/apache/iceberg/issues/8891#issuecomment-1776759965 @anvanna: Are you able to read the data from the Iceberg table with another tool? The issue with writing with Tez, that the newly created files are not propagated to the HS2 and they a

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369764771 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/FlinkReadConf.java: ## @@ -152,6 +155,16 @@ public boolean includeColumnStats() { .parse(); } +

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369763072 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/FlinkReadOptions.java: ## @@ -96,6 +96,9 @@ private FlinkReadOptions() {} public static final ConfigOption

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369762755 ## flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/FlinkReadConf.java: ## @@ -190,4 +203,11 @@ public int maxAllowedPlanningFailures() { .defaultValue(

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369762281 ## docs/flink-configuration.md: ## @@ -130,6 +130,7 @@ env.getConfig() | streaming | connector.iceberg.streaming | N/A

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369761988 ## core/src/main/java/org/apache/iceberg/V1Metadata.java: ## @@ -485,6 +486,11 @@ public DataFile copy() { return wrapped.copy(); } +@Override +publ

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369761574 ## core/src/main/java/org/apache/iceberg/ManifestGroup.java: ## @@ -417,6 +429,10 @@ boolean shouldKeepStats() { return !dropStats; } +Set statsToKeep()

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369761206 ## core/src/main/java/org/apache/iceberg/GenericDataFile.java: ## @@ -66,23 +68,30 @@ class GenericDataFile extends BaseFile implements DataFile { * Copy constructor

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369760806 ## core/src/main/java/org/apache/iceberg/BaseScan.java: ## @@ -165,6 +169,12 @@ public ThisT includeColumnStats() { return newRefinedScan(table, schema, context.sho

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369760407 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -504,6 +508,27 @@ private static Map toReadableByteBufferMap(Map Map filterColumnsStats( + Map map, Se

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369758919 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -504,6 +508,27 @@ private static Map toReadableByteBufferMap(Map Map filterColumnsStats( + Map map, Se

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369756713 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -504,6 +508,27 @@ private static Map toReadableByteBufferMap(Map Map filterColumnsStats( Review Comment:

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369755601 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -504,6 +508,27 @@ private static Map toReadableByteBufferMap(Map Map filterColumnsStats( + Map map, Se

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369754595 ## api/src/test/java/org/apache/iceberg/TestHelpers.java: ## @@ -662,6 +663,11 @@ public DataFile copyWithoutStats() { return this; } +@Override +pu

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-10-24 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1369754161 ## api/src/main/java/org/apache/iceberg/util/ContentFileUtil.java: ## @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

Re: [I] Iceberg Materialized View Spec [iceberg]

2023-10-24 Thread via GitHub
JanKaul commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1776642611 @wmoustafa I hope you are okay with using the table identifier as the storage table pointer instead of using the metadata location. But I don't see a way to use the metadata location

Re: [I] Iceberg Materialized View Spec [iceberg]

2023-10-24 Thread via GitHub
JanKaul commented on issue #6420: URL: https://github.com/apache/iceberg/issues/6420#issuecomment-1776639925 I've updated the issue description and the google doc (https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?usp=sharing). I would love to get yo