Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-25 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +296,77 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
danielcweeks commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920907162 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
danielcweeks commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920903258 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
RussellSpitzer commented on PR #11906: URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2598728800 > @ismailsimsek [my issue](https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049) with this PR is the same as the previous pr. This isn't a scaleable solution. The

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-17 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-15 Thread via GitHub
danielcweeks commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1917214900 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-15 Thread via GitHub
danielcweeks commented on PR #11906: URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2593735172 @ismailsimsek [my issue](https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049) with this PR is the same as the previous pr. This isn't a scaleable solution. The f

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-14 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1914531213 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,19 +294,49 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-08 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907274886 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -589,21 +620,42 @@ private FileURI toFileURI(I input)

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-08 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907274886 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -589,21 +620,42 @@ private FileURI toFileURI(I input)

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-08 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907270693 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -854,12 +867,14 @@ public void testCompareToFileList()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-07 Thread via GitHub
RussellSpitzer commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1905894464 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -854,12 +867,14 @@ public void testCompareToFileList

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-07 Thread via GitHub
RussellSpitzer commented on PR #11906: URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2575979132 The test here says it's failling because youare deleting ``` but the following elements were unexpected: ["file:/tmp/junit-14563533605645158466/data/_c2_tr

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-07 Thread via GitHub
ismailsimsek commented on PR #11906: URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2574907719 cc @flyrain @RussellSpitzer @rahil-c its ready for review and test added. also will appreciate any suggestion on the failing test. -- This is an automated message from the Apach

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-06 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903140118 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -292,14 +293,37 @@ private Dataset validFileIdentDS()

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-05 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903264071 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -610,9 +613,12 @@ public void testHiddenPathsStarting

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-05 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903264071 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -610,9 +613,12 @@ public void testHiddenPathsStarting

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-05 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903264071 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -610,9 +613,12 @@ public void testHiddenPathsStarting

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-04 Thread via GitHub
ismailsimsek commented on code in PR #11906: URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903139413 ## spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java: ## @@ -610,9 +613,12 @@ public void testHiddenPathsStarting

[PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2025-01-04 Thread via GitHub
ismailsimsek opened a new pull request, #11906: URL: https://github.com/apache/iceberg/pull/11906 Continuing #7914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-12-20 Thread via GitHub
danielcweeks commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049 @steveloughran This isn't about bulk deletes (which S3FileIO does support). The issue is how to properly scale the identification of orphaned files, which is function of the procedu

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-12-20 Thread via GitHub
steveloughran commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2557667397 S3a fs implemente bulk delete too...maybe this and S3FileIO can do the right thing (*) (*) we added it to all filesystems, but the page size of the others is zero -- This i

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-12-11 Thread via GitHub
Samreay commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2537695273 Has anyone got a nice workaround for how to remove orphan files for an S3-located iceberg table? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-24 Thread via GitHub
yunlou11 commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2434551384 @rahil-c please check if its true below: listWithPrefix maybe list too much unexpected files, such as 2 tables: sample , sample_part matchingFiles will contain: ```text ...

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-21 Thread via GitHub
MonkeyCanCode commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2427982933 Confirm the issue is still there. After manually set the spark.hadoop.fs.s3.impl to S3A. If the client has S3 credential with needed access, it will work. However, if through creden

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-17 Thread via GitHub
yunlou11 commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2421517117 ```sql CALL nessie.system.remove_orphan_files(table => 'nessie.robot_dev.robot_data') ``` ```text Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileS

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-11 Thread via GitHub
github-actions[bot] commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2408258018 This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If y

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-10-11 Thread via GitHub
github-actions[bot] closed pull request #7914: Use SupportsPrefixOperations for Remove OrphanFile Procedure URL: https://github.com/apache/iceberg/pull/7914 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-29 Thread via GitHub
flyrain commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736840318 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -330,11 +345,18 @@ private Dataset listedFileDS() { Bro

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-29 Thread via GitHub
flyrain commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-29 Thread via GitHub
flyrain commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-14 Thread via GitHub
steveloughran commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2288374116 HadoopFileIO (and therefore the local fs) supports listPrefix. It'll need a CustomFileIO as with similar tests -- This is an automated message from the Apache Git Service. To

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-08-13 Thread via GitHub
steveloughran commented on code in PR #7914: URL: https://github.com/apache/iceberg/pull/7914#discussion_r1715740496 ## spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java: ## @@ -330,11 +345,18 @@ private Dataset listedFileDS() {

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-08 Thread via GitHub
rahil-c commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2216488472 > The basic issue is we want to make sure we have a test which uses both the supportsPrefix enabled FS and one where it is not enabled to we are sure that both implementations remain corr

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-08 Thread via GitHub
RussellSpitzer commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2214536758 The basic issue is we want to make sure we have a test which uses both the supportsPrefix enabled FS and one where it is not enabled to we are sure that both implementations remain

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-07 Thread via GitHub
rahil-c commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2212775830 @RussellSpitzer @amogh-jahagirdar Wanted to understand what is the actual test needed for this change? I saw this comment ``` We also need a test which exercises this code path, (D

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-03 Thread via GitHub
rahil-c commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2208107487 Hi all sorry for the delay on this issue, been engaged in many internal things at work so did not get time to revisit this. Originally when I encountered this issue it was a very s

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-07-03 Thread via GitHub
schobe commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2207301578 Hi , I am also facing the same issue while running orphan file clean up via Nessie REST. Auto-compaction and snapshot expiry works, but orphan file clean up procecure gives the same error.

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-04-04 Thread via GitHub
carlosescura commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2036615576 @rahil-c is there any possibility to continue working on this PR? Many of us would really appreciate it. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-03-19 Thread via GitHub
nastra commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006792641 @carlosescura the issue itself hasn't be solved yet. I'm not sure if @rahil-c is actively working on this issue. If not, maybe someone else from the community is interested in working on t

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-03-19 Thread via GitHub
carlosescura commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006763129 @lokeshrdy Still doesn't work using Spark `3.5.0`and Iceberg `1.5.0` and Glue as catalog with the following config: ``` SPARK_SETTINGS = [ ( "spark.jars",

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-03-06 Thread via GitHub
lokeshrdy commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1980642069 same issue here . let me know if anyone solved this with latest version? @carlosescura @domonkosbalogh-seon @rahil-c -- This is an automated message from the Apache Git Service. To r

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-02-12 Thread via GitHub
carlosescura commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1939197029 Same issue here. I can't run the `remove_orphan_files` procedure using Glue and S3 😢 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2024-01-04 Thread via GitHub
domonkosbalogh-seon commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1877212931 Ran into a similar issue (same as in https://github.com/apache/iceberg/issues/8368) using the Glue Catalog. Is there maybe a workaround to this, or this PR would be the only f

Re: [PR] Use SupportsPrefixOperations for Remove OrphanFile Procedure [iceberg]

2023-11-22 Thread via GitHub
lyohar commented on PR #7914: URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1822629812 Got similar issue in 1.4.2, spark 3.5 My iceberg catalogue in Spark is configured via org.apache.iceberg.aws.s3.S3FileIO filesystem. I store files using s3 prefix; Hovewer, when