Hi Bryce,

This clarifies a lot — I was indeed confused regarding formats. The reference 
[5] was really helpful to clarify my confusion. 

Let me ask one more question regarding Julia interfaces before closing the 
thread. So does this mean that we don’t have a function that loads parquet 
files into the Julia implementation of the Arrow in-memory format? Looks like 
the only way is converting it to the IPC format using Parquet.jl and Arrow.jl 
and reload it. Am I correct? 

Like:
# convert a parquet file into the Arrow IPC format
tab = Parquet.readfile(“blah.parquet”)
Arrow.write(“blah.arrow”, tab)

# reload it into in-memory data
tab2 = Arrow.read(“blah.arrow")

- Kazu

> On Mar 21, 2023, at 6:40 PM, Bryce Mecum <[email protected]> wrote:
> 
> Hi Kazu, from the description of what behavior you're seeing and the code 
> you've provided, it looks like you may be mixing up the two file formats 
> (Arrow IPC and Parquet) in your code. Your Julia code looks like it's using 
> the Arrow IPC file format whereas your Python code looks like it's using the 
> Parquet file format.
> 
> If you want to use Parquet to share data:
> 
> - In Julia: Use the Parquet package and its read_table and write_table 
> methods [1]
> - In Python: Use pyarrow.parquet module and its read_table and write_table 
> methods [2]
> 
> If you want to use Arrow IPC to share data:
> 
> - In Julia: Use the Arrow package and its Arrow.table and Arrow.write methods 
> [3] 
> - In Python: Use the pyarrow package and the IPC readers and writers [4] 
> 
> Additionally, there is a FAQ [5] on the Apache Arrow website about formats 
> that you may find relevant.
> 
> [1] https://github.com/JuliaIO/Parquet.jl 
> <https://github.com/JuliaIO/Parquet.jl>
> [2] https://arrow.apache.org/docs/python/parquet.html 
> <https://arrow.apache.org/docs/python/parquet.html>
> [3] https://arrow.juliadata.org/dev/manual/#User-Manual 
> <https://arrow.juliadata.org/dev/manual/#User-Manual>
> [4] https://arrow.apache.org/docs/python/ipc.html 
> <https://arrow.apache.org/docs/python/ipc.html>
> [5] https://arrow.apache.org/faq/#what-about-arrow-files-then 
> <https://arrow.apache.org/faq/#what-about-arrow-files-then>
> On Tue, Mar 21, 2023 at 12:00 PM Kazunori Akiyama <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello,
> 
> I’m a radio astronomer working for the Event Horizon Telescope 
> <https://eventhorizontelescope.org/> project. We are interested in Apache 
> Arrow for our next-generation data format as other radio astronomy groups 
> started to develop a new Arrow-based data format 
> <https://github.com/ratt-ru/casa-arrow>. We are currently developing major 
> software ecosystems in Julia and Python, and would like to test data IO 
> interfaces with Arrow.jl and pyarrow.
> 
> I’m writing this e-mail because I faced some issues in loading Arrow table 
> data created in a different language. We just did a very simple check like 
> creating Arrow tables in python and Julia, and loading them in another 
> language (i.e. Julia and Python respectively). While we confirmed that each 
> of pyarrow and Arrow.jl can read parquet files generated from itself, it 
> can’t load parquet files from another language. For instance, we found
> 
> pyarrow can’t read a table written by Arrow.write method of Julia’s 
> Arrow.jl.It <http://arrow.jl.it/> returns `ArrowInvalid: Could not open 
> Parquet input source ‘FILENAME': Parquet magic bytes not found in footer. 
> Either the file is corrupted or this is not a parquet file.`
> Arrow.jl can’t read a table from pyarrow. It doesn’t show any errors, but the 
> loaded table is completely empty and doesn’t have any rows and cols.
> 
> I have attached Julia and python scripts that create parquet files of a very 
> simple single-column table (juliadf.parquet from julia, pandasdf.parquet from 
> python). pyarrow.parquet.read_table doesn’t work for juliadf.parquet, and 
> Arrow.Table methods doesn’t work for pandasdf.parquet. I also attached 
> python’s pip freeze file and Julia’s toml files just in case you want to see 
> my python and julia enviroments.
> 
> As this is a very primitive test, I’m pretty sure I made some simple mistakes 
> here. What I’m missing? Let me know how I should handle parquet files from 
> interfaces in different languages.
> 
> Thanks,
> Kazu
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to