ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +296,77 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +296,77 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +296,77 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1929534587
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +296,77 @@ private Dataset validFileIdentDS()
danielcweeks commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920907162
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
danielcweeks commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920903258
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
RussellSpitzer commented on PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2598728800
> @ismailsimsek [my
issue](https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049)
with this PR is the same as the previous pr. This isn't a scaleable solution.
The
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1920398798
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
danielcweeks commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1917214900
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
danielcweeks commented on PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2593735172
@ismailsimsek [my
issue](https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049)
with this PR is the same as the previous pr. This isn't a scaleable solution.
The f
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1914531213
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,19 +294,49 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907274886
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -589,21 +620,42 @@ private FileURI toFileURI(I input)
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907274886
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -589,21 +620,42 @@ private FileURI toFileURI(I input)
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1907270693
##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -854,12 +867,14 @@ public void testCompareToFileList()
RussellSpitzer commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1905894464
##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -854,12 +867,14 @@ public void testCompareToFileList
RussellSpitzer commented on PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2575979132
The test here says it's failling because youare deleting
```
but the following elements were unexpected:
["file:/tmp/junit-14563533605645158466/data/_c2_tr
ismailsimsek commented on PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#issuecomment-2574907719
cc @flyrain @RussellSpitzer @rahil-c its ready for review and test added.
also will appreciate any suggestion on the failing test.
--
This is an automated message from the Apach
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903140118
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -292,14 +293,37 @@ private Dataset validFileIdentDS()
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903264071
##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -610,9 +613,12 @@ public void
testHiddenPathsStarting
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903264071
##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -610,9 +613,12 @@ public void
testHiddenPathsStarting
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903264071
##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -610,9 +613,12 @@ public void
testHiddenPathsStarting
ismailsimsek commented on code in PR #11906:
URL: https://github.com/apache/iceberg/pull/11906#discussion_r1903139413
##
spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/actions/TestRemoveOrphanFilesAction.java:
##
@@ -610,9 +613,12 @@ public void
testHiddenPathsStarting
ismailsimsek opened a new pull request, #11906:
URL: https://github.com/apache/iceberg/pull/11906
Continuing #7914
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsub
danielcweeks commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2557715049
@steveloughran This isn't about bulk deletes (which S3FileIO does support).
The issue is how to properly scale the identification of orphaned files, which
is function of the procedu
steveloughran commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2557667397
S3a fs implemente bulk delete too...maybe this and S3FileIO can do the right
thing (*)
(*) we added it to all filesystems, but the page size of the others is zero
--
This i
Samreay commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2537695273
Has anyone got a nice workaround for how to remove orphan files for an
S3-located iceberg table?
--
This is an automated message from the Apache Git Service.
To respond to the message,
yunlou11 commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2434551384
@rahil-c please check if its true below:
listWithPrefix maybe list too much unexpected files, such as 2 tables:
sample , sample_part
matchingFiles will contain:
```text
...
MonkeyCanCode commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2427982933
Confirm the issue is still there. After manually set the
spark.hadoop.fs.s3.impl to S3A. If the client has S3 credential with needed
access, it will work. However, if through creden
yunlou11 commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2421517117
```sql
CALL nessie.system.remove_orphan_files(table =>
'nessie.robot_dev.robot_data')
```
```text
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No
FileS
github-actions[bot] commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2408258018
This pull request has been closed due to lack of activity. This is not a
judgement on the merit of the PR in any way. It is just a way of keeping the PR
queue manageable. If y
github-actions[bot] closed pull request #7914: Use SupportsPrefixOperations for
Remove OrphanFile Procedure
URL: https://github.com/apache/iceberg/pull/7914
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to
flyrain commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736840318
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -330,11 +345,18 @@ private Dataset listedFileDS() {
Bro
flyrain commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {
flyrain commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1736836729
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -299,7 +300,21 @@ private Dataset actualFileIdentDS() {
steveloughran commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2288374116
HadoopFileIO (and therefore the local fs) supports listPrefix. It'll need a
CustomFileIO as with similar tests
--
This is an automated message from the Apache Git Service.
To
steveloughran commented on code in PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#discussion_r1715740496
##
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##
@@ -330,11 +345,18 @@ private Dataset listedFileDS() {
rahil-c commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2216488472
> The basic issue is we want to make sure we have a test which uses both the
supportsPrefix enabled FS and one where it is not enabled to we are sure that
both implementations remain corr
RussellSpitzer commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2214536758
The basic issue is we want to make sure we have a test which uses both the
supportsPrefix enabled FS and one where it is not enabled to we are sure that
both implementations remain
rahil-c commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2212775830
@RussellSpitzer @amogh-jahagirdar Wanted to understand what is the actual
test needed for this change? I saw this comment
```
We also need a test which exercises this code path, (D
rahil-c commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2208107487
Hi all sorry for the delay on this issue, been engaged in many internal
things at work so did not get time to revisit this.
Originally when I encountered this issue it was a very s
schobe commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2207301578
Hi , I am also facing the same issue while running orphan file clean up via
Nessie REST. Auto-compaction and snapshot expiry works, but orphan file clean
up procecure gives the same error.
carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2036615576
@rahil-c is there any possibility to continue working on this PR? Many of us
would really appreciate it.
--
This is an automated message from the Apache Git Service.
To respond to
nastra commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006792641
@carlosescura the issue itself hasn't be solved yet. I'm not sure if
@rahil-c is actively working on this issue. If not, maybe someone else from the
community is interested in working on t
carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-2006763129
@lokeshrdy Still doesn't work using Spark `3.5.0`and Iceberg `1.5.0` and
Glue as catalog with the following config:
```
SPARK_SETTINGS = [
(
"spark.jars",
lokeshrdy commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1980642069
same issue here . let me know if anyone solved this with latest version?
@carlosescura @domonkosbalogh-seon @rahil-c
--
This is an automated message from the Apache Git Service.
To r
carlosescura commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1939197029
Same issue here. I can't run the `remove_orphan_files` procedure using Glue
and S3 😢
--
This is an automated message from the Apache Git Service.
To respond to the message, please
domonkosbalogh-seon commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1877212931
Ran into a similar issue (same as in
https://github.com/apache/iceberg/issues/8368) using the Glue Catalog. Is there
maybe a workaround to this, or this PR would be the only f
lyohar commented on PR #7914:
URL: https://github.com/apache/iceberg/pull/7914#issuecomment-1822629812
Got similar issue in 1.4.2, spark 3.5
My iceberg catalogue in Spark is configured via
org.apache.iceberg.aws.s3.S3FileIO filesystem. I store files using s3 prefix;
Hovewer, when
52 matches
Mail list logo