[GitHub] [iceberg] jfz commented on pull request #5812: Use Java collections in AwsProperties to fix Kryo serialization.

2022-11-04 Thread GitBox
jfz commented on PR #5812: URL: https://github.com/apache/iceberg/pull/5812#issuecomment-1304407445 > Thanks @jfz , looks good to me! Thank you for reviewing @szehon-ho ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [iceberg] manuzhang commented on pull request #5392: Spark: Fix a separate table cache being created for each rewriteFiles

2022-11-04 Thread GitBox
manuzhang commented on PR #5392: URL: https://github.com/apache/iceberg/pull/5392#issuecomment-1304384550 @RussellSpitzer please check again whether it's fixed now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [iceberg] github-actions[bot] commented on issue #3968: Is there a full example for Iceberg+Flink+Minio

2022-11-04 Thread GitBox
github-actions[bot] commented on issue #3968: URL: https://github.com/apache/iceberg/issues/3968#issuecomment-1304351704 This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' -- This is an automated message from the Apache Gi

[GitHub] [iceberg] github-actions[bot] closed issue #3968: Is there a full example for Iceberg+Flink+Minio

2022-11-04 Thread GitBox
github-actions[bot] closed issue #3968: Is there a full example for Iceberg+Flink+Minio URL: https://github.com/apache/iceberg/issues/3968 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [iceberg] szehon-ho commented on issue #4127: Delete files not eventually removed if RewriteDataFile run right after delete (when using 'use-starting-sequence-number' default)

2022-11-04 Thread GitBox
szehon-ho commented on issue #4127: URL: https://github.com/apache/iceberg/issues/4127#issuecomment-1304351032 Design doc here: https://github.com/apache/iceberg/issues/6126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [iceberg] rdblue commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-04 Thread GitBox
rdblue commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1014528415 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.combine

[GitHub] [iceberg] djouallah commented on pull request #6076: Python: Replace mmh3 with mmhash3

2022-11-04 Thread GitBox
djouallah commented on PR #6076: URL: https://github.com/apache/iceberg/pull/6076#issuecomment-1304345523 is there an ETA when it will merged ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] szehon-ho opened a new issue, #6126: RemoveDanglingDeleteFiles

2022-11-04 Thread GitBox
szehon-ho opened a new issue, #6126: URL: https://github.com/apache/iceberg/issues/6126 ### Feature Request / Improvement With DeleteFiles introduced to Iceberg Spec V2, we need a mechanism to remove them from the current snapshot after they become invalid. Proposal is here:

[GitHub] [iceberg] jzhuge commented on a diff in pull request #4925: API: Add view interfaces

2022-11-04 Thread GitBox
jzhuge commented on code in PR #4925: URL: https://github.com/apache/iceberg/pull/4925#discussion_r1014520008 ## api/src/main/java/org/apache/iceberg/view/ViewBuilder.java: ## @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contri

[GitHub] [iceberg] ddrinka commented on issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

2022-11-04 Thread GitBox
ddrinka commented on issue #6120: URL: https://github.com/apache/iceberg/issues/6120#issuecomment-1304291834 Ok @Fokko, check out the PR and let me know what you think. I think there's still room for improvement on aligning the handling of partitions between `PartitionSummary` and `Da

[GitHub] [iceberg] can-sun opened a new issue, #6125: Encountered throttling when writting to S3 without repartitioning

2022-11-04 Thread GitBox
can-sun opened a new issue, #6125: URL: https://github.com/apache/iceberg/issues/6125 ### Apache Iceberg version 0.14.1 ### Query engine Spark ### Please describe the bug 🐞 I am using the following code snippet to batch write data to my S3 bucket and encoun

[GitHub] [iceberg] Fokko opened a new issue, #6124: Support Python 3.11

2022-11-04 Thread GitBox
Fokko opened a new issue, #6124: URL: https://github.com/apache/iceberg/issues/6124 ### Feature Request / Improvement Python 3.11 has been released. Also, PyArrow has recently added support: https://github.com/apache/arrow/pull/14499 Once there is a new PyArrow release, I think

[GitHub] [iceberg] rdblue merged pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-04 Thread GitBox
rdblue merged PR #6010: URL: https://github.com/apache/iceberg/pull/6010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apac

[GitHub] [iceberg] rdblue commented on pull request #6010: Python: Fix PyArrowFileIO caching

2022-11-04 Thread GitBox
rdblue commented on PR #6010: URL: https://github.com/apache/iceberg/pull/6010#issuecomment-1304137072 Looks great. Thanks, @Fokko! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [iceberg] Fokko commented on a diff in pull request #6069: Python: TableScan Plan files API implementation without residual evaluation

2022-11-04 Thread GitBox
Fokko commented on code in PR #6069: URL: https://github.com/apache/iceberg/pull/6069#discussion_r1014404025 ## python/pyiceberg/cli/output.py: ## @@ -49,6 +52,10 @@ def describe_table(self, table: Table) -> None: def files(self, table: Table, io: FileIO, history: bool) ->

[GitHub] [iceberg] singhpk234 commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

2022-11-04 Thread GitBox
singhpk234 commented on issue #5867: URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1304081971 ideally ```shell -- conf spark.sql.catalog.{catalog_name}.glue.skip-name-validation=false ``` should have worked, can you please add the complete spark conf's you are

[GitHub] [iceberg] shahsmit14 commented on issue #3619: Iceberg support ranger to make access data more safety

2022-11-04 Thread GitBox
shahsmit14 commented on issue #3619: URL: https://github.com/apache/iceberg/issues/3619#issuecomment-1304038444 What is the best way to get some traction on this ask? At-least understanding if this ask is something even the community thinks to consider. -- This is an automated message fr

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014323668 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-04 Thread GitBox
aokolnychyi commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1014324189 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.co

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014323668 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id", In

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-04 Thread GitBox
aokolnychyi commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1014323401 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.co

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014322081 ## core/src/main/java/org/apache/iceberg/BaseFilesTable.java: ## @@ -140,42 +142,72 @@ protected CloseableIterable doPlanFiles() { } static class ManifestRead

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
szehon-ho commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014312886 ## core/src/main/java/org/apache/iceberg/BaseFilesTable.java: ## @@ -140,42 +142,72 @@ protected CloseableIterable doPlanFiles() { } static class ManifestRead

[GitHub] [iceberg] sunchao commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-04 Thread GitBox
sunchao commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1014311120 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.combin

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1014286079 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] jfz commented on a diff in pull request #5812: Use Java collections in AwsProperties to fix Kryo serialization.

2022-11-04 Thread GitBox
jfz commented on code in PR #5812: URL: https://github.com/apache/iceberg/pull/5812#discussion_r1014285082 ## aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java: ## @@ -493,53 +492,7 @@ public class AwsProperties implements Serializable { private String dynamoDbEndpo

[GitHub] [iceberg] ddrinka opened a new pull request, #6123: Python: Support creating a DateLiteral from a date (#6120)

2022-11-04 Thread GitBox
ddrinka opened a new pull request, #6123: URL: https://github.com/apache/iceberg/pull/6123 * Renamed date_to_days to date_str_to_days * Created new date_to_days to convert a Python date object to int days * Contributes to workaround for #6120 -- This is an automated message f

[GitHub] [iceberg] findepi commented on pull request #4381: Core: Make DeleteFilter's constructor parameters more specific

2022-11-04 Thread GitBox
findepi commented on PR #4381: URL: https://github.com/apache/iceberg/pull/4381#issuecomment-1303889930 > DeleteFilter interface is defined in Trino, I think DeleteFilter in Iceberg is never used in Trino, right? @findepi I guess you mean these changes https://github.com/trinodb/trino

[GitHub] [iceberg] ahshahid commented on issue #6039: Spark : Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition

2022-11-04 Thread GitBox
ahshahid commented on issue #6039: URL: https://github.com/apache/iceberg/issues/6039#issuecomment-1303883448 Some update: For tpcds query with limited data and enabling stats at manifest level for non partition cols, still does not improve perf.. the cost of dpp query is pretty high, es

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #2276: Core: Add option to combine tasks by partition

2022-11-04 Thread GitBox
aokolnychyi commented on code in PR #2276: URL: https://github.com/apache/iceberg/pull/2276#discussion_r1014217373 ## core/src/main/java/org/apache/iceberg/util/TableScanUtil.java: ## @@ -71,6 +78,57 @@ public static CloseableIterable splitFiles( return CloseableIterable.co

[GitHub] [iceberg] sririshindra commented on a diff in pull request #6025: [Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs.

2022-11-04 Thread GitBox
sririshindra commented on code in PR #6025: URL: https://github.com/apache/iceberg/pull/6025#discussion_r1014240147 ## docs/spark-procedures.md: ## @@ -421,12 +421,17 @@ Existing data files are added to the Iceberg table's metadata and can be read us To leave the original ta

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014239621 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Soft

[GitHub] [iceberg] sririshindra commented on a diff in pull request #6025: [Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs.

2022-11-04 Thread GitBox
sririshindra commented on code in PR #6025: URL: https://github.com/apache/iceberg/pull/6025#discussion_r1014239306 ## docs/spark-procedures.md: ## @@ -421,12 +421,17 @@ Existing data files are added to the Iceberg table's metadata and can be read us To leave the original ta

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014235434 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Soft

[GitHub] [iceberg] rdblue commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-04 Thread GitBox
rdblue commented on code in PR #6113: URL: https://github.com/apache/iceberg/pull/6113#discussion_r1014236424 ## core/src/main/java/org/apache/iceberg/util/JsonUtil.java: ## @@ -374,4 +380,40 @@ void validate(JsonNode element) { element); } } + + public stati

[GitHub] [iceberg] rdblue commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-04 Thread GitBox
rdblue commented on code in PR #6113: URL: https://github.com/apache/iceberg/pull/6113#discussion_r1014236424 ## core/src/main/java/org/apache/iceberg/util/JsonUtil.java: ## @@ -374,4 +380,40 @@ void validate(JsonNode element) { element); } } + + public stati

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014235434 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Soft

[GitHub] [iceberg] rdblue commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-04 Thread GitBox
rdblue commented on code in PR #6113: URL: https://github.com/apache/iceberg/pull/6113#discussion_r1014233883 ## core/src/main/java/org/apache/iceberg/util/JsonUtil.java: ## @@ -251,6 +252,11 @@ public static Set getIntegerSet(String property, JsonNode node) { .build()

[GitHub] [iceberg] danielcweeks commented on a diff in pull request #6045: [iceberg-hive-metastore] Support setting individual and group ownership for Namespace

2022-11-04 Thread GitBox
danielcweeks commented on code in PR #6045: URL: https://github.com/apache/iceberg/pull/6045#discussion_r1014232442 ## core/src/main/java/org/apache/iceberg/TableProperties.java: ## @@ -360,5 +360,7 @@ private TableProperties() {} public static final String UPSERT_ENABLED = "

[GitHub] [iceberg] rdblue commented on a diff in pull request #5392: Spark: Fix a separate table cache being created for each rewriteFiles

2022-11-04 Thread GitBox
rdblue commented on code in PR #5392: URL: https://github.com/apache/iceberg/pull/5392#discussion_r1014229760 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java: ## @@ -94,10 +95,14 @@ private boolean useStartingSequenceNumber;

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014211767 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Soft

[GitHub] [iceberg] aokolnychyi commented on pull request #2276: Core: Add option to combine tasks by partition

2022-11-04 Thread GitBox
aokolnychyi commented on PR #2276: URL: https://github.com/apache/iceberg/pull/2276#issuecomment-1303798458 Let me take another look in a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014201558 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014199642 ## spark/v3.3/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java: ## @@ -0,0 +1,498 @@ +/* + * Licensed to the Apache Soft

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014193865 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] nastra commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
nastra commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1014180932 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScansWithPartitionEvolution.java: ## @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014114843 ## core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java: ## @@ -107,14 +117,20 @@ public static ScanReport fromJson(JsonNode json) { List projectedFi

[GitHub] [iceberg] martindurant commented on issue #5800: Integrate pyiceberg with Dask

2022-11-04 Thread GitBox
martindurant commented on issue #5800: URL: https://github.com/apache/iceberg/issues/5800#issuecomment-1303714276 cc https://github.com/martindurant/daskberg/issues/1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014136769 ## core/src/main/java/org/apache/iceberg/EnvironmentContext.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014117969 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -140,6 +140,8 @@ private CatalogProperties() {} public static final String APP_ID = "app-id";

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014117067 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014114843 ## core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java: ## @@ -107,14 +117,20 @@ public static ScanReport fromJson(JsonNode json) { List projectedFi

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014113140 ## core/src/main/java/org/apache/iceberg/EnvironmentContext.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more cont

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014109381 ## core/src/main/java/org/apache/iceberg/CatalogProperties.java: ## @@ -140,6 +140,8 @@ private CatalogProperties() {} public static final String APP_ID = "app-id";

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014108559 ## core/src/main/java/org/apache/iceberg/metrics/ScanReportParser.java: ## @@ -107,14 +117,20 @@ public static ScanReport fromJson(JsonNode json) { List projectedFi

[GitHub] [iceberg] rdblue commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
rdblue commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014106936 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] code-magician323 commented on issue #4977: Support Kafka Connect within Iceberg

2022-11-04 Thread GitBox
code-magician323 commented on issue #4977: URL: https://github.com/apache/iceberg/issues/4977#issuecomment-1303656262 @kbendick Do you think there will be progress at this area soon? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014095953 ## core/src/main/java/org/apache/iceberg/BaseFilesTable.java: ## @@ -140,42 +142,72 @@ protected CloseableIterable doPlanFiles() { } static class Manifes

[GitHub] [iceberg] nastra commented on a diff in pull request #6113: Core: Reduce code duplication around writing JSON collections

2022-11-04 Thread GitBox
nastra commented on code in PR #6113: URL: https://github.com/apache/iceberg/pull/6113#discussion_r1014062148 ## core/src/main/java/org/apache/iceberg/util/JsonUtil.java: ## @@ -251,6 +252,11 @@ public static Set getIntegerSet(String property, JsonNode node) { .build()

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1014059674 ## spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/SparkCatalog.java: ## @@ -532,6 +537,23 @@ public final void initialize(String name, CaseInsensitiveStringMap

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014048088 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -142,9 +142,21 @@ public static Schema selectNot(Schema schema, Set fieldIds) { } pub

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5376: Core: Add readable metrics columns to files metadata tables

2022-11-04 Thread GitBox
RussellSpitzer commented on code in PR #5376: URL: https://github.com/apache/iceberg/pull/5376#discussion_r1014038296 ## api/src/main/java/org/apache/iceberg/DataFile.java: ## @@ -99,10 +99,24 @@ public interface DataFile extends ContentFile { optional(140, "sort_order_id

[GitHub] [iceberg] ConeyLiu commented on pull request #5632: Core: Avoid reading ManifestFile when create ManifestReader

2022-11-04 Thread GitBox
ConeyLiu commented on PR #5632: URL: https://github.com/apache/iceberg/pull/5632#issuecomment-1303486785 Thanks @szehon-ho @rdblue @nastra @zinking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [iceberg] nastra commented on pull request #6108: SparkBatchQueryScan logs too much - #6106

2022-11-04 Thread GitBox
nastra commented on PR #6108: URL: https://github.com/apache/iceberg/pull/6108#issuecomment-1303478725 @Omega359 could you please fix the missing import so that the code compiles? Would be great to get this merged -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1013983522 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -135,11 +143,14 @@ public CloseableIterable planFiles() { planningDuration.stop();

[GitHub] [iceberg] nastra commented on a diff in pull request #6058: Core,Spark: Add metadata to Scan Report

2022-11-04 Thread GitBox
nastra commented on code in PR #6058: URL: https://github.com/apache/iceberg/pull/6058#discussion_r1013976800 ## core/src/main/java/org/apache/iceberg/BaseTableScan.java: ## @@ -141,6 +142,8 @@ public CloseableIterable planFiles() { doPlanFiles(), () -> {

[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
ajantha-bhat commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1013964398 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] findepi commented on pull request #5129: Add source snapshot info to Puffin Blob metadata

2022-11-04 Thread GitBox
findepi commented on PR #5129: URL: https://github.com/apache/iceberg/pull/5129#issuecomment-1303363469 @rdblue Now that we have this on the blob metadata level, do we still need to have `org.apache.iceberg.StatisticsFile#snapshotId` field? cc @ajantha-bhat -- This is an automate

[GitHub] [iceberg] findepi commented on a diff in pull request #6091: Spark-3.3: Handle statistics file clean up from expireSnapshots action/procedure

2022-11-04 Thread GitBox
findepi commented on code in PR #6091: URL: https://github.com/apache/iceberg/pull/6091#discussion_r1013950242 ## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ## @@ -1234,6 +1245,40 @@ public void testMultipleRefsAndCleanExpiredFilesFailsForIncrementalCleanup

[GitHub] [iceberg] harini-venkataraman commented on issue #6089: Issue with Creation of Database using Spark

2022-11-04 Thread GitBox
harini-venkataraman commented on issue #6089: URL: https://github.com/apache/iceberg/issues/6089#issuecomment-1303312171 **RCA :** Spark had introduced a new configuration - `spark.sql.warehouse.dir` https://issues.apache.org/jira/browse/SPARK-15034 Tried changing this in the config

[GitHub] [iceberg] harini-venkataraman closed issue #6089: Issue with Creation of Database using Spark

2022-11-04 Thread GitBox
harini-venkataraman closed issue #6089: Issue with Creation of Database using Spark URL: https://github.com/apache/iceberg/issues/6089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [iceberg] findepi commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
findepi commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1013924386 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] findepi commented on a diff in pull request #6090: Core: Handle statistics file clean up from expireSnapshots

2022-11-04 Thread GitBox
findepi commented on code in PR #6090: URL: https://github.com/apache/iceberg/pull/6090#discussion_r1013923155 ## core/src/main/java/org/apache/iceberg/FileCleanupStrategy.java: ## @@ -79,4 +80,15 @@ protected void deleteFiles(Set pathsToDelete, String fileType) {

[GitHub] [iceberg] RussellSpitzer commented on issue #6122: IcebergGenerics.read(table) doesn't work as expected

2022-11-04 Thread GitBox
RussellSpitzer commented on issue #6122: URL: https://github.com/apache/iceberg/issues/6122#issuecomment-1303235332 Are you adding a default name mapping for your table? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [iceberg] kamaljit-1991 opened a new issue, #6122: IcebergGenerics.read(table) doesn't work as expected

2022-11-04 Thread GitBox
kamaljit-1991 opened a new issue, #6122: URL: https://github.com/apache/iceberg/issues/6122 ### Apache Iceberg version 0.13.1 ### Query engine _No response_ ### Please describe the bug 🐞 It is little bit of related https://github.com/apache/iceberg/issues/45

[GitHub] [iceberg] Fokko merged pull request #6119: Remove Fokko from the list of collaborators

2022-11-04 Thread GitBox
Fokko merged PR #6119: URL: https://github.com/apache/iceberg/pull/6119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apach

[GitHub] [iceberg] lvyanquan commented on pull request #6111: Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog

2022-11-04 Thread GitBox
lvyanquan commented on PR #6111: URL: https://github.com/apache/iceberg/pull/6111#issuecomment-1303144296 Resubmitted to port the change to 1.16 module. "cache-enabled" is reserved now, since users who set "cache-enabled" to "false" before would need to add property "cache.expiration-int

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
ConeyLiu commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013697964 ## core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java: ## @@ -107,13 +107,15 @@ public Schema record(Schema record, List names, Iterable s

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
ConeyLiu commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013696351 ## core/src/test/java/org/apache/iceberg/TestMetadataTableScans.java: ## @@ -978,6 +1091,32 @@ private Set expectedManifestListPaths(Iterable snapshots, Long

[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #4577: Fixes read metadata table failed due to illegal character

2022-11-04 Thread GitBox
ConeyLiu commented on code in PR #4577: URL: https://github.com/apache/iceberg/pull/4577#discussion_r1013695871 ## core/src/main/java/org/apache/iceberg/avro/BuildAvroProjection.java: ## @@ -107,13 +107,15 @@ public Schema record(Schema record, List names, Iterable s

[GitHub] [iceberg] Fokko commented on issue #6120: [Python] The structure of a partition definition and partition instance should be consistent

2022-11-04 Thread GitBox
Fokko commented on issue #6120: URL: https://github.com/apache/iceberg/issues/6120#issuecomment-1303064851 Hey @ddrinka Thanks for opening the PR. The date is a so-called logical type that most storage formats store internally as the days since 1970-01-01. After reading, this should be conv

[GitHub] [iceberg] Fokko commented on pull request #6119: Remove Fokko from the list of collaborators

2022-11-04 Thread GitBox
Fokko commented on PR #6119: URL: https://github.com/apache/iceberg/pull/6119#issuecomment-1303053285 @singhpk234 Good call, just updated the PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [iceberg] Fokko commented on pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-04 Thread GitBox
Fokko commented on PR #6117: URL: https://github.com/apache/iceberg/pull/6117#issuecomment-1303050019 Thanks for spotting this one @ddrinka It looks like we also need a not-`None` check. I also noticed that this is being fixed in https://github.com/apache/iceberg/pull/6069. So I'll close th

[GitHub] [iceberg] Fokko closed pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal`

2022-11-04 Thread GitBox
Fokko closed pull request #6117: Fix typo in `_ManifestEvalVisitor.visit_equal` URL: https://github.com/apache/iceberg/pull/6117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.