Riaz Arbi created ARROW-17444: --------------------------------- Summary: Windows Only: Cannot delete file previously accesed with open_dataset Key: ARROW-17444 URL: https://issues.apache.org/jira/browse/ARROW-17444 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 8.0.1, 9.0.0, 8.0.0 Environment: Windows 10 R 4.2.1 RStudio 22.07.1 Arrow 9.0 (fails on arrow 8 as well) Reporter: Riaz Arbi
Hello, I encountered this issue because it breaks my tests when I run {code:java} rhub::check_for_cran(){code} Because of this, I know it only affects Windows, all other OS checks pass. If you write files to a directory using arrow's {code:java} write_*{code} functions, and then {code:java} collect(open_dataset(directory)){code} you cannot delete a file in the directory, you get an error. This is best demonstrated in a reprex: {code:java} # setup ------------------------------------------------------------------------ local_prefix <- tempfile() df <- data.frame(a = 1:5, b = letters[1:5]) # works ------------------------------------------------------------------------ fs <- LocalFileSystem$create() fs$CreateDir(local_prefix) fsdir <- fs$cd(local_prefix) write_parquet(df, fsdir$path("1.parquet")) #open_dataset(local_prefix) %>% collect() fsdir$DeleteFile("1.parquet") unlink(local_prefix, recursive = TRUE) # doesn't work ----------------------------------------------------------------- fs <- LocalFileSystem$create() fs$CreateDir(local_prefix) fsdir <- fs$cd(local_prefix) write_parquet(df, fsdir$path("1.parquet")) open_dataset(local_prefix) %>% collect() fsdir$DeleteFile("1.parquet") unlink(local_prefix, recursive = TRUE) {code} Here is the error I keep getting: {code:java} Error: IOError: Cannot delete file 'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. Detail: [Windows error 32] The process cannot access the file because it is being used by another process. {code} Note that * I **do not create an object from the `open_dataset` function**. I simply call it. * I also call `collect` in order to pull the data. So I cannot see why the connection to the file should exist after collect is called * my environment pane looks identical in both instances. * I do not need to restart R to delete the file. I can simply clear all objects from the workspace (rm(list = ls()) and then it works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010)