Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1894289015 Thanks for reviewing, @szehon-ho @RussellSpitzer! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi merged PR #8755: URL: https://github.com/apache/iceberg/pull/8755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-16 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1894272954 I gave this PR a round of testing on the cluster and it seems to work as expected. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-15 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1452867560 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

[PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-15 Thread via GitHub
aokolnychyi opened a new pull request, #8755: URL: https://github.com/apache/iceberg/pull/8755 This PR has code to parallelize reading of deletes and enable caching them on executors. I also have a follow-up change to assign tasks for one partition to the same executor, similar to

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-15 Thread via GitHub
aokolnychyi closed pull request #8755: API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors URL: https://github.com/apache/iceberg/pull/8755 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-12 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1450968650 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-11 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1448696780 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1446667326 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1446660072 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445991044 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445975170 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445974562 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445974562 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445973562 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445973562 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445953610 ## data/src/main/java/org/apache/iceberg/data/DeleteFilter.java: ## @@ -224,14 +223,10 @@ public Predicate eqDeletedRowFilter() { } public

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445928357 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445927458 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-09 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1445832888 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-04 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1442126743 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441180949 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441178910 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441178910 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441177965 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1441176818 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440937961 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440935178 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440934324 ## core/src/main/java/org/apache/iceberg/deletes/Deletes.java: ## @@ -125,6 +126,25 @@ public static StructLikeSet toEqualitySet( } } + public static

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1875952717 @singhpk234 @RussellSpitzer @szehon-ho, I rebased this. I addressed most comments, I am working on tests and docs. There are a few open questions too. I'll take a look at them

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440931514 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440930608 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440930608 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440929838 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,59 @@ private static void checkSchemaCompatibility( } } + public static

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440921673 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440382646 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkSQLProperties.java: ## @@ -70,4 +72,18 @@ private SparkSQLProperties() {} // Controls whether to

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440372097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440372097 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440368146 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440367424 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440366952 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440362963 ## data/src/main/java/org/apache/iceberg/data/DeleteFilter.java: ## @@ -224,14 +223,10 @@ public Predicate eqDeletedRowFilter() { } public

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440336894 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440335582 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440331799 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440325074 ## core/src/main/java/org/apache/iceberg/deletes/PositionDeleteIndex.java: ## @@ -44,4 +44,14 @@ public interface PositionDeleteIndex { /** Returns true if

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440322057 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440319918 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-03 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1440319498 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-02 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1434014371 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-02 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1434014371 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-02 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1434014371 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-02 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1433435467 ## core/src/main/java/org/apache/iceberg/deletes/PositionDeleteIndex.java: ## @@ -44,4 +44,14 @@ public interface PositionDeleteIndex { /** Returns true if this

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1439702880 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2024-01-02 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1439698943 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkExecutorCache.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-21 Thread via GitHub
szehon-ho commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1433431468 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,68 @@ private static void checkSchemaCompatibility( } } + /** + *

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-20 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1864056256 @RussellSpitzer @szehon-ho, could you take a look? I am adding tests and benchmarks but the main code is ready for another round. -- This is an automated message from the Apache

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-20 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1432399087 ## core/src/main/java/org/apache/iceberg/util/ThreadPools.java: ## @@ -68,8 +68,9 @@ public static ExecutorService getWorkerPool() { /** * Return an {@link

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-19 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1431972974 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -43,14 +43,14 @@ private SystemConfigs() {} Integer::parseUnsignedInt); /** -

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-14 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1427103384 ## core/src/main/java/org/apache/iceberg/util/ThreadPools.java: ## @@ -68,8 +68,9 @@ public static ExecutorService getWorkerPool() { /** * Return an {@link

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-14 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1427064092 ## core/src/main/java/org/apache/iceberg/util/ThreadPools.java: ## @@ -68,8 +68,9 @@ public static ExecutorService getWorkerPool() { /** * Return an

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-05 Thread via GitHub
aokolnychyi commented on PR #8755: URL: https://github.com/apache/iceberg/pull/8755#issuecomment-1841668881 The Flink test failure does not seem related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-05 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1416089345 ## core/src/main/java/org/apache/iceberg/deletes/BitmapPositionDeleteIndex.java: ## @@ -27,6 +27,15 @@ class BitmapPositionDeleteIndex implements

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-05 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1416085857 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -43,14 +43,14 @@ private SystemConfigs() {} Integer::parseUnsignedInt); /** -

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-05 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1416084451 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412674377 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -43,14 +43,14 @@ private SystemConfigs() {} Integer::parseUnsignedInt); /** -

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412667139 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -500,4 +501,21 @@ public static Snapshot latestSnapshot(TableMetadata metadata, String

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412657382 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -500,4 +501,21 @@ public static Snapshot latestSnapshot(TableMetadata metadata, String

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412657113 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -43,14 +43,14 @@ private SystemConfigs() {} Integer::parseUnsignedInt); /** -

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412656524 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,59 @@ private static void checkSchemaCompatibility( } } + public static

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412655796 ## core/src/main/java/org/apache/iceberg/deletes/BitmapPositionDeleteIndex.java: ## @@ -27,6 +27,15 @@ class BitmapPositionDeleteIndex implements

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412655501 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412642976 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -500,4 +501,21 @@ public static Snapshot latestSnapshot(TableMetadata metadata, String

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412642976 ## core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java: ## @@ -500,4 +501,21 @@ public static Snapshot latestSnapshot(TableMetadata metadata, String

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-12-01 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1412637270 ## core/src/main/java/org/apache/iceberg/deletes/EmptyPositionDeleteIndex.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-29 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1409670268 ## core/src/main/java/org/apache/iceberg/deletes/BitmapPositionDeleteIndex.java: ## @@ -27,6 +27,15 @@ class BitmapPositionDeleteIndex implements

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-29 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1409668358 ## core/src/main/java/org/apache/iceberg/SystemConfigs.java: ## @@ -43,14 +43,14 @@ private SystemConfigs() {} Integer::parseUnsignedInt); /** -

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-29 Thread via GitHub
RussellSpitzer commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1409617517 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,59 @@ private static void checkSchemaCompatibility( } } + public

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-20 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1350603878 ## data/src/main/java/org/apache/iceberg/data/DeleteLoader.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-20 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1399806008 ## data/src/main/java/org/apache/iceberg/data/BaseDeleteLoader.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-20 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1399804623 ## core/src/main/java/org/apache/iceberg/deletes/PositionDeleteIndex.java: ## @@ -44,4 +44,14 @@ public interface PositionDeleteIndex { /** Returns true if

Re: [PR] API, Core, Spark 3.5: Parallelize reading of deletes and cache them on executors [iceberg]

2023-11-20 Thread via GitHub
aokolnychyi commented on code in PR #8755: URL: https://github.com/apache/iceberg/pull/8755#discussion_r1399800368 ## api/src/main/java/org/apache/iceberg/types/TypeUtil.java: ## @@ -452,6 +454,59 @@ private static void checkSchemaCompatibility( } } + public static