Re: [I] Add NOT MATCHED BY SOURCE to MERGE INTO [iceberg]

2023-11-14 Thread via GitHub
corleyma commented on issue #7842: URL: https://github.com/apache/iceberg/issues/7842#issuecomment-1811905236 Answering my own question, it looks like this was aded in [Iceberg 1.4.0/Spark 3.5](https://iceberg.apache.org/releases/#140-release). -- This is an automated message from the Apa

Re: [I] Flink SQL SELECT ORDER BY clause caused data loss. [iceberg]

2023-11-14 Thread via GitHub
a8356555 commented on issue #9022: URL: https://github.com/apache/iceberg/issues/9022#issuecomment-1811897024 > @a8356555 maybe it is just a copy-paste error, but I noticed that in the table with the loss, you're selecting from a different table. sorry it's a copy-paste error. -- T

[I] Failed to create namespace using spark sql based on iceberg hadoop catalog (rest catalog) [iceberg]

2023-11-14 Thread via GitHub
TCGOGOGO opened a new issue, #9072: URL: https://github.com/apache/iceberg/issues/9072 ### Apache Iceberg version 1.3.1 ### Query engine Spark ### Please describe the bug 🐞 ### Steps 1. Deploy an iceberg rest catalog server which wrapped the HadoopCatalog

Re: [PR] Replace i64 with DateTime [iceberg-rust]

2023-11-14 Thread via GitHub
Xuanwo commented on code in PR #94: URL: https://github.com/apache/iceberg-rust/pull/94#discussion_r1393652782 ## crates/iceberg/src/spec/timestamp_millis.rs: ## Review Comment: By the way, do we need a new type? How about just returning `DateTime`? -- This is an automa

Re: [PR] Replace i64 with DateTime [iceberg-rust]

2023-11-14 Thread via GitHub
liurenjie1024 commented on code in PR #94: URL: https://github.com/apache/iceberg-rust/pull/94#discussion_r1393639699 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -153,7 +155,7 @@ pub(super) mod _serde { #[serde(skip_serializing_if = "Option::is_none")] pub pa

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393606348 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393606348 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

[I] when hdfs router restart, task failed with read data files " xx parquet is not a parquet file (length is too low :0) " or manifest "java.io.EOFException: Unexpected EOF with 4 bytes remaining to

2023-11-14 Thread via GitHub
chenwyi2 opened a new issue, #9071: URL: https://github.com/apache/iceberg/issues/9071 ### Apache Iceberg version 1.2.1 ### Query engine None ### Please describe the bug 🐞 recently, we found that when hdfs router has stuck or restart, we will meet some file

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393529465 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393529465 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393529465 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393529465 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393448207 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Replace i64 with DateTime [iceberg-rust]

2023-11-14 Thread via GitHub
fqaiser94 commented on code in PR #94: URL: https://github.com/apache/iceberg-rust/pull/94#discussion_r1393512136 ## crates/iceberg/src/spec/snapshot.rs: ## @@ -72,7 +73,7 @@ pub struct Snapshot { sequence_number: i64, /// A timestamp when the snapshot was created, use

Re: [I] Provide a protected TableMetadata() constructor to allow inheritance [iceberg]

2023-11-14 Thread via GitHub
github-actions[bot] commented on issue #7509: URL: https://github.com/apache/iceberg/issues/7509#issuecomment-1811593099 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

Re: [I] Provide a protected TableMetadata() constructor to allow inheritance [iceberg]

2023-11-14 Thread via GitHub
github-actions[bot] closed issue #7509: Provide a protected TableMetadata() constructor to allow inheritance URL: https://github.com/apache/iceberg/issues/7509 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Provide a protected TableMetadata.MetadataLogEntry(long, String) constructor for inheritance [iceberg]

2023-11-14 Thread via GitHub
github-actions[bot] closed issue #7510: Provide a protected TableMetadata.MetadataLogEntry(long, String) constructor for inheritance URL: https://github.com/apache/iceberg/issues/7510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Provide a protected TableMetadata.MetadataLogEntry(long, String) constructor for inheritance [iceberg]

2023-11-14 Thread via GitHub
github-actions[bot] commented on issue #7510: URL: https://github.com/apache/iceberg/issues/7510#issuecomment-1811593055 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[PR] Build: Bump aiohttp from 3.8.5 to 3.8.6 [iceberg-python]

2023-11-14 Thread via GitHub
dependabot[bot] opened a new pull request, #149: URL: https://github.com/apache/iceberg-python/pull/149 Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.5 to 3.8.6. Release notes Sourced from https://github.com/aio-libs/aiohttp/releases";>aiohttp's releases. 3.8

Re: [I] Add geometry type to iceberg [iceberg]

2023-11-14 Thread via GitHub
jiayuasu commented on issue #2586: URL: https://github.com/apache/iceberg/issues/2586#issuecomment-1811575846 For folks who are interested in Geometry/Raster + Iceberg, we @wherobots have implemented an iceberg-compatible spatial table format called Havasu: https://docs.wherobots.services/1

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393448207 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2023-11-14 Thread via GitHub
singhpk234 commented on code in PR #8980: URL: https://github.com/apache/iceberg/pull/8980#discussion_r1393421486 ## core/src/main/java/org/apache/iceberg/MicroBatches.java: ## @@ -92,7 +92,7 @@ private static List> indexManifests( for (ManifestFile manifest : manifestFi

Re: [PR] Core: Iceberg streaming streaming-skip-overwrite-snapshots SparkMicroBatchStream only skips over one file per trigger [iceberg]

2023-11-14 Thread via GitHub
singhpk234 commented on code in PR #8980: URL: https://github.com/apache/iceberg/pull/8980#discussion_r1393422572 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkMicroBatchStream.java: ## @@ -392,8 +392,15 @@ public Offset latestOffset(Offset startOffset,

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393418204 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

[PR] Build: Bump mypy-boto3-glue from 1.28.77 to 1.29.0 [iceberg-python]

2023-11-14 Thread via GitHub
dependabot[bot] opened a new pull request, #148: URL: https://github.com/apache/iceberg-python/pull/148 Bumps [mypy-boto3-glue](https://github.com/youtype/mypy_boto3_builder) from 1.28.77 to 1.29.0. Commits See full diff in https://github.com/youtype/mypy_boto3_builder/commits"

[PR] Build: Bump mkdocstrings from 0.23.0 to 0.24.0 [iceberg-python]

2023-11-14 Thread via GitHub
dependabot[bot] opened a new pull request, #147: URL: https://github.com/apache/iceberg-python/pull/147 Bumps [mkdocstrings](https://github.com/mkdocstrings/mkdocstrings) from 0.23.0 to 0.24.0. Release notes Sourced from https://github.com/mkdocstrings/mkdocstrings/releases";>mkdoc

[PR] Build: Bump duckdb from 0.9.1 to 0.9.2 [iceberg-python]

2023-11-14 Thread via GitHub
dependabot[bot] opened a new pull request, #146: URL: https://github.com/apache/iceberg-python/pull/146 Bumps [duckdb](https://github.com/duckdb/duckdb) from 0.9.1 to 0.9.2. Release notes Sourced from https://github.com/duckdb/duckdb/releases";>duckdb's releases. 0.9.2 Bugfix

[I] Error generating Go code from rest-catalog-open-api.yaml [iceberg]

2023-11-14 Thread via GitHub
romusz opened a new issue, #9070: URL: https://github.com/apache/iceberg/issues/9070 ### Apache Iceberg version 1.4.2 (latest release) ### Query engine None ### Please describe the bug 🐞 Hi, Trying to generate Go code from `rest-catalog-open-api.yaml`

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1393318722 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { fi

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1393295105 ## .palantir/revapi.yml: ## @@ -866,6 +866,11 @@ acceptedBreaks: old: "method void org.apache.iceberg.encryption.Ciphers::()" new: "method void org.ap

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi merged PR #8803: URL: https://github.com/apache/iceberg/pull/8803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on PR #8803: URL: https://github.com/apache/iceberg/pull/8803#issuecomment-1811313831 I went through Ryan's comments one more time. They seem to be addressed. I also think the current version is simpler. Let's merge it as is and follow up if needed to unblock subsequen

Re: [PR] Core: Fix split size calculations in file rewriters [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on PR #9069: URL: https://github.com/apache/iceberg/pull/9069#issuecomment-1811254453 I reverted removal of the split overhead. Some tests are sensitive. I'll make that change in a follow-up PR. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]

2023-11-14 Thread via GitHub
Fokko commented on code in PR #9063: URL: https://github.com/apache/iceberg/pull/9063#discussion_r1393214691 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -108,6 +110,110 @@ public ParquetValueReader struct( } } + private clas

Re: [PR] Parquet: Move to ValueReader generation to a visitor [iceberg]

2023-11-14 Thread via GitHub
nk1506 commented on code in PR #9063: URL: https://github.com/apache/iceberg/pull/9063#discussion_r1393126118 ## parquet/src/main/java/org/apache/iceberg/data/parquet/BaseParquetReaders.java: ## @@ -108,6 +110,110 @@ public ParquetValueReader struct( } } + private cla

Re: [PR] Core: Fix split size calculations in file rewriters [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9069: URL: https://github.com/apache/iceberg/pull/9069#discussion_r1393115296 ## core/src/test/java/org/apache/iceberg/actions/TestSizeBasedRewriter.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Core: Fix split size calculations in file rewriters [iceberg]

2023-11-14 Thread via GitHub
nk1506 commented on code in PR #9069: URL: https://github.com/apache/iceberg/pull/9069#discussion_r1393100634 ## core/src/test/java/org/apache/iceberg/actions/TestSizeBasedRewriter.java: ## @@ -0,0 +1,100 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] Core: Fix split size calculations in file rewriters [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi commented on code in PR #9069: URL: https://github.com/apache/iceberg/pull/9069#discussion_r1393062719 ## core/src/main/java/org/apache/iceberg/actions/SizeBasedFileRewriter.java: ## @@ -101,8 +101,6 @@ public abstract class SizeBasedFileRewriter, F exte public

[PR] Core: Fix split size calculations in file rewriters [iceberg]

2023-11-14 Thread via GitHub
aokolnychyi opened a new pull request, #9069: URL: https://github.com/apache/iceberg/pull/9069 This PR adjusts the split computation logic in file rewriters. The previous logic performed poorly in some cases. Suppose we have 4 files, 145 MB each. This means we have 580 MB to compact.

[PR] Azure: Allow shared-key auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
snazy opened a new pull request, #9068: URL: https://github.com/apache/iceberg/pull/9068 The Azure Blob Storage client library, including the hadoop-azure stuff, allow using shared-key authentication, which is handy for testing purposes with Azurite. Unfortunately, `AzureProperties` does no

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-11-14 Thread via GitHub
snazy commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1365303813 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -223,27 +284,57 @@ namespace, getRef().getName()), } public boolean dropNamespac

Re: [PR] GCP: Use correct Guava imports [iceberg]

2023-11-14 Thread via GitHub
nastra merged PR #9067: URL: https://github.com/apache/iceberg/pull/9067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [PR] Build: Bump griffe from 0.36.9 to 0.38.0 [iceberg-python]

2023-11-14 Thread via GitHub
Fokko merged PR #144: URL: https://github.com/apache/iceberg-python/pull/144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Build: Bump moto from 4.2.7 to 4.2.8 [iceberg-python]

2023-11-14 Thread via GitHub
Fokko merged PR #143: URL: https://github.com/apache/iceberg-python/pull/143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Build: Bump mkdocstrings-python from 1.7.3 to 1.7.4 [iceberg-python]

2023-11-14 Thread via GitHub
Fokko merged PR #142: URL: https://github.com/apache/iceberg-python/pull/142 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Docs: Add section on pandas [iceberg-python]

2023-11-14 Thread via GitHub
Fokko commented on PR #138: URL: https://github.com/apache/iceberg-python/pull/138#issuecomment-1810751392 Thanks for the review @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Docs: Add section on pandas [iceberg-python]

2023-11-14 Thread via GitHub
Fokko merged PR #138: URL: https://github.com/apache/iceberg-python/pull/138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.

Re: [PR] Docs: Add section on pandas [iceberg-python]

2023-11-14 Thread via GitHub
Fokko commented on code in PR #138: URL: https://github.com/apache/iceberg-python/pull/138#discussion_r1392935210 ## mkdocs/docs/api.md: ## @@ -346,6 +346,45 @@ tpep_dropoff_datetime: [[2021-04-01 00:47:59.00,...,2021-05-01 00:14:47. This will only pull in the files

Re: [PR] Docs: Add section on pandas [iceberg-python]

2023-11-14 Thread via GitHub
Fokko commented on code in PR #138: URL: https://github.com/apache/iceberg-python/pull/138#discussion_r1392923192 ## mkdocs/docs/api.md: ## @@ -346,6 +346,45 @@ tpep_dropoff_datetime: [[2021-04-01 00:47:59.00,...,2021-05-01 00:14:47. This will only pull in the files

[I] Type Promotion: Long to Timestamp [iceberg]

2023-11-14 Thread via GitHub
danielcweeks opened a new issue, #9065: URL: https://github.com/apache/iceberg/issues/9065 ### Feature Request / Improvement Long promotion should evaluate to Timestamp MS (regardless of schema definition) ### Query engine None -- This is an automated message from the

[PR] Parquet: Move to the visitor [iceberg]

2023-11-14 Thread via GitHub
Fokko opened a new pull request, #9063: URL: https://github.com/apache/iceberg/pull/9063 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9061: URL: https://github.com/apache/iceberg/pull/9061#discussion_r1392616478 ## gcp/src/main/java/org/apache/iceberg/gcp/GCPProperties.java: ## @@ -90,6 +95,10 @@ public GCPProperties(Map properties) { gcsOAuth2TokenExpiresAt =

Re: [PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9061: URL: https://github.com/apache/iceberg/pull/9061#discussion_r1392613903 ## gcp/src/main/java/org/apache/iceberg/gcp/GCPProperties.java: ## @@ -90,6 +95,10 @@ public GCPProperties(Map properties) { gcsOAuth2TokenExpiresAt =

Re: [PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9061: URL: https://github.com/apache/iceberg/pull/9061#discussion_r1392605260 ## gcp/src/main/java/org/apache/iceberg/gcp/GCPProperties.java: ## @@ -18,13 +18,15 @@ */ package org.apache.iceberg.gcp; +import com.google.api.client.util.Precond

Re: [PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9061: URL: https://github.com/apache/iceberg/pull/9061#discussion_r1392604814 ## gcp/src/main/java/org/apache/iceberg/gcp/GCPProperties.java: ## @@ -18,13 +18,15 @@ */ package org.apache.iceberg.gcp; +import com.google.api.client.util.Precond

Re: [PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9061: URL: https://github.com/apache/iceberg/pull/9061#discussion_r1392604572 ## gcp/src/main/java/org/apache/iceberg/gcp/gcs/GCSFileIO.java: ## @@ -55,6 +56,7 @@ * Overview */ public class GCSFileIO implements DelegateFileIO { + Review Comm

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392294340 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { filter

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-11-14 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392599014 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -540,4 +585,64 @@ public void close() { api.close(); } } + + private vo

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392591040 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -540,4 +585,64 @@ public void close() { api.close(); } } + + private vo

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-11-14 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392574413 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -540,4 +585,64 @@ public void close() { api.close(); } } + + private vo

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392557462 ## nessie/src/main/java/org/apache/iceberg/nessie/NessieIcebergClient.java: ## @@ -540,4 +585,64 @@ public void close() { api.close(); } } + + private vo

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392294340 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { filter

Re: [PR] Nessie: reimplement namespace operations [iceberg]

2023-11-14 Thread via GitHub
adutra commented on code in PR #8857: URL: https://github.com/apache/iceberg/pull/8857#discussion_r1392554187 ## nessie/src/test/java/org/apache/iceberg/nessie/TestNessieIcebergClient.java: ## @@ -91,6 +111,412 @@ public void testWithReferenceAfterRecreatingBranch() Asserti

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392534466 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java: ## @@ -2944,4 +2945,77 @@ private RowLevelOperationMode mode(Table table

Re: [I] NoSuchMethodError: 'scala.Option org.apache.spark.sql.connector.expressions.BucketTransform [iceberg]

2023-11-14 Thread via GitHub
nastra commented on issue #9023: URL: https://github.com/apache/iceberg/issues/9023#issuecomment-1810133146 I'll close this one because it's caused by mixed Iceberg + Spark versions being used. @DeelFeel can you please confirm this is fixed? If not, please re-open the issue. -- This is a

Re: [I] NoSuchMethodError: 'scala.Option org.apache.spark.sql.connector.expressions.BucketTransform [iceberg]

2023-11-14 Thread via GitHub
nastra closed issue #9023: NoSuchMethodError: 'scala.Option org.apache.spark.sql.connector.expressions.BucketTransform URL: https://github.com/apache/iceberg/issues/9023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
ajantha-bhat commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392528464 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java: ## @@ -2944,4 +2945,77 @@ private RowLevelOperationMode mode(Table

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
ajantha-bhat commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392502584 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java: ## @@ -2944,4 +2945,77 @@ private RowLevelOperationMode mode(Table

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
ajantha-bhat commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392502584 ## spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java: ## @@ -2944,4 +2945,77 @@ private RowLevelOperationMode mode(Table

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
ajantha-bhat commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392489081 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -401,7 +401,7 @@ private Expression conflictDetectionFilter() { f

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
ajantha-bhat commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392489081 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -401,7 +401,7 @@ private Expression conflictDetectionFilter() { f

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on PR #8803: URL: https://github.com/apache/iceberg/pull/8803#issuecomment-1809974009 @rdblue: I have fixed the changes requested by you. If you have any further comments, please leave a review. @aokolnychyi did another throughout review and applied most of his suggest

Re: [I] Iceberg: Partition-Level Tagging Support [iceberg]

2023-11-14 Thread via GitHub
nastra commented on issue #9060: URL: https://github.com/apache/iceberg/issues/9060#issuecomment-1809972002 @Am1rr3zA I'm not sure I fully understand the problem you're trying to solve. Can't this be solved using a Tag at the table-level as described in https://iceberg.apache.org/docs/lates

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392385312 ## core/src/main/java/org/apache/iceberg/GenericDataFile.java: ## @@ -66,23 +67,31 @@ class GenericDataFile extends BaseFile implements DataFile { * Copy constructor

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392376668 ## core/src/main/java/org/apache/iceberg/ManifestGroup.java: ## @@ -154,6 +156,12 @@ ManifestGroup caseSensitive(boolean newCaseSensitive) { return this; } + M

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392375838 ## core/src/main/java/org/apache/iceberg/IncrementalDataTableScan.java: ## @@ -102,7 +102,8 @@ public CloseableIterable planFiles() { snapshotIds.con

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392372882 ## core/src/main/java/org/apache/iceberg/DataTableScan.java: ## @@ -76,7 +76,8 @@ public CloseableIterable doPlanFiles() { .filterData(filter())

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392371790 ## core/src/main/java/org/apache/iceberg/BaseFile.java: ## @@ -185,13 +188,30 @@ public PartitionData copy() { this.partitionType = toCopy.partitionType; this.r

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392372415 ## core/src/main/java/org/apache/iceberg/BaseIncrementalChangelogScan.java: ## @@ -78,7 +78,8 @@ protected CloseableIterable doPlanFiles( .select(scanColumns

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392370573 ## core/src/main/java/org/apache/iceberg/BaseDistributedDataScan.java: ## @@ -368,7 +369,9 @@ private CloseableIterable toFileTasks( ScanMetricsUtil.fileTask(s

Re: [PR] Core: Enable column statistics filtering after planning [iceberg]

2023-11-14 Thread via GitHub
pvary commented on code in PR #8803: URL: https://github.com/apache/iceberg/pull/8803#discussion_r1392370245 ## .palantir/revapi.yml: ## @@ -866,6 +866,11 @@ acceptedBreaks: old: "method void org.apache.iceberg.encryption.Ciphers::()" new: "method void org.apache.i

Re: [PR] Nessie: Support views for NessieCatalog [iceberg]

2023-11-14 Thread via GitHub
ajantha-bhat commented on PR #8909: URL: https://github.com/apache/iceberg/pull/8909#issuecomment-1809949624 Only these two comments needs conclusion. Rest of the comments are addressed. - https://github.com/apache/iceberg/pull/8909#discussion_r1374101581 - https://github.com/apac

Re: [PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
XN137 commented on code in PR #9061: URL: https://github.com/apache/iceberg/pull/9061#discussion_r1392354434 ## gcp/src/main/java/org/apache/iceberg/gcp/GCPProperties.java: ## @@ -18,13 +18,15 @@ */ package org.apache.iceberg.gcp; +import com.google.api.client.util.Precondi

Re: [PR] Spark: Add serialzable isolation test for concurrent MERGE INTOs [iceberg]

2023-11-14 Thread via GitHub
nastra commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1392294340 ## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { filter

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
leo-livis closed issue #9059: Memory overflow when Iceberg compact files URL: https://github.com/apache/iceberg/issues/9059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] Open-API: Bump dependencies [iceberg]

2023-11-14 Thread via GitHub
Fokko opened a new pull request, #9062: URL: https://github.com/apache/iceberg/pull/9062 The other dependencies can go, since they are pulled in by `datamodel-code-generator`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[PR] GCS: Allow no-auth for testing purposes [iceberg]

2023-11-14 Thread via GitHub
snazy opened a new pull request, #9061: URL: https://github.com/apache/iceberg/pull/9061 Although there is no "official" Google Cloud Storage emulator available yet, there is [one available](https://github.com/oittaa/gcp-storage-emulator) that allows at least some basic testing. To use an e

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
leo-livis commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809860908 > Have you tried increasing the Heap size? Additionally, what you can do is limit on which data compaction is executed using a `where` as described in https://iceberg.apache.org/do

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
leo-livis commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809860243 I'll give it a try, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Build: Bump software.amazon.awssdk:bom from 2.21.21 to 2.21.22 [iceberg]

2023-11-14 Thread via GitHub
nastra merged PR #9053: URL: https://github.com/apache/iceberg/pull/9053 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
nastra commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809855119 Have you tried increasing the Heap size? Additionally, what you can do is limit on which data compaction is executed using a `where` as described in https://iceberg.apache.org/docs/la

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
leo-livis commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809788718 > This doesn't look like a memory issue. It rather seems that the channel was closed as indicated by the `ClosedChannelException` Thank you. This issue seems to be due to a m

Re: [PR] Build: Bump mkdocs-material from 9.1.21 to 9.4.8 [iceberg]

2023-11-14 Thread via GitHub
nastra merged PR #9055: URL: https://github.com/apache/iceberg/pull/9055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
nastra commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809769709 This doesn't look like a memory issue. It rather seems that the channel was closed as indicated by the `ClosedChannelException` -- This is an automated message from the Apache Git S

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
leo-livis commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809760170 > @leo-livis can you provide a stack trace that shows the exact problem? Also how much memory does the JVM have that runs compaction? The application has 4GB of memory. The d

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
leo-livis closed issue #9059: Memory overflow when Iceberg compact files URL: https://github.com/apache/iceberg/issues/9059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[I] Iceberg: Partition-Level Tagging Support [iceberg]

2023-11-14 Thread via GitHub
Am1rr3zA opened a new issue, #9060: URL: https://github.com/apache/iceberg/issues/9060 ### Feature Request / Improvement I want to be able to support Tagging at partition level. Let's consider a straightforward fact table that requires restating at specific times: one day after

Re: [I] Memory overflow when Iceberg compact files [iceberg]

2023-11-14 Thread via GitHub
nastra commented on issue #9059: URL: https://github.com/apache/iceberg/issues/9059#issuecomment-1809736998 @leo-livis can you provide a stack trace that shows the exact problem? Also how much memory does the JVM have that runs compaction? -- This is an automated message from the Apache G