I'm guessing you mean write_table?  Assuming you are passing a
filename / string (and not an open output stream) to write_table I
would expect that any files opened during the call have been closed
before the call returns.

Pedantically, this is not quite the same thing as "finished writing on
disk" but more accurately, "finished writing to the OS".  A power
outage shortly after a call to write_table completes could lead to
partial loss of a file.

However, this should not matter for your case if I am understanding
your problem statement in that reddit post.  As long as you open that
file handle to read after you have finished the call to write_table
you should see all of the contents immediately.

There is always the opportunity for bugs but many of our unit tests
write files and then turn around and immediately read them and we
don't typically have trouble here.  I'm assuming your reader & writer
are on the same thread & process?  If you open a reader it's possible
your read task is running while your write task is running and then no
guarantees would be made.

On Thu, Jan 6, 2022 at 12:47 PM Brandon Chinn <[email protected]> wrote:
>
> When `pyarrow.parquet.write_file()` returns, is the parquet file finished 
> writing on disk, or is it still writing?
>
> Context: 
> https://www.reddit.com/r/learnpython/comments/rxmq43/help_with_python_file_flakily_not_returning_full/hrj99tq/?context=3
>
> Thanks!
> Brandon Chinn

Reply via email to