Hi Bryce, This clarifies a lot — I was indeed confused regarding formats. The reference [5] was really helpful to clarify my confusion.
Let me ask one more question regarding Julia interfaces before closing the thread. So does this mean that we don’t have a function that loads parquet files into the Julia implementation of the Arrow in-memory format? Looks like the only way is converting it to the IPC format using Parquet.jl and Arrow.jl and reload it. Am I correct? Like: # convert a parquet file into the Arrow IPC format tab = Parquet.readfile(“blah.parquet”) Arrow.write(“blah.arrow”, tab) # reload it into in-memory data tab2 = Arrow.read(“blah.arrow") - Kazu > On Mar 21, 2023, at 6:40 PM, Bryce Mecum <[email protected]> wrote: > > Hi Kazu, from the description of what behavior you're seeing and the code > you've provided, it looks like you may be mixing up the two file formats > (Arrow IPC and Parquet) in your code. Your Julia code looks like it's using > the Arrow IPC file format whereas your Python code looks like it's using the > Parquet file format. > > If you want to use Parquet to share data: > > - In Julia: Use the Parquet package and its read_table and write_table > methods [1] > - In Python: Use pyarrow.parquet module and its read_table and write_table > methods [2] > > If you want to use Arrow IPC to share data: > > - In Julia: Use the Arrow package and its Arrow.table and Arrow.write methods > [3] > - In Python: Use the pyarrow package and the IPC readers and writers [4] > > Additionally, there is a FAQ [5] on the Apache Arrow website about formats > that you may find relevant. > > [1] https://github.com/JuliaIO/Parquet.jl > <https://github.com/JuliaIO/Parquet.jl> > [2] https://arrow.apache.org/docs/python/parquet.html > <https://arrow.apache.org/docs/python/parquet.html> > [3] https://arrow.juliadata.org/dev/manual/#User-Manual > <https://arrow.juliadata.org/dev/manual/#User-Manual> > [4] https://arrow.apache.org/docs/python/ipc.html > <https://arrow.apache.org/docs/python/ipc.html> > [5] https://arrow.apache.org/faq/#what-about-arrow-files-then > <https://arrow.apache.org/faq/#what-about-arrow-files-then> > On Tue, Mar 21, 2023 at 12:00 PM Kazunori Akiyama <[email protected] > <mailto:[email protected]>> wrote: > Hello, > > I’m a radio astronomer working for the Event Horizon Telescope > <https://eventhorizontelescope.org/> project. We are interested in Apache > Arrow for our next-generation data format as other radio astronomy groups > started to develop a new Arrow-based data format > <https://github.com/ratt-ru/casa-arrow>. We are currently developing major > software ecosystems in Julia and Python, and would like to test data IO > interfaces with Arrow.jl and pyarrow. > > I’m writing this e-mail because I faced some issues in loading Arrow table > data created in a different language. We just did a very simple check like > creating Arrow tables in python and Julia, and loading them in another > language (i.e. Julia and Python respectively). While we confirmed that each > of pyarrow and Arrow.jl can read parquet files generated from itself, it > can’t load parquet files from another language. For instance, we found > > pyarrow can’t read a table written by Arrow.write method of Julia’s > Arrow.jl.It <http://arrow.jl.it/> returns `ArrowInvalid: Could not open > Parquet input source ‘FILENAME': Parquet magic bytes not found in footer. > Either the file is corrupted or this is not a parquet file.` > Arrow.jl can’t read a table from pyarrow. It doesn’t show any errors, but the > loaded table is completely empty and doesn’t have any rows and cols. > > I have attached Julia and python scripts that create parquet files of a very > simple single-column table (juliadf.parquet from julia, pandasdf.parquet from > python). pyarrow.parquet.read_table doesn’t work for juliadf.parquet, and > Arrow.Table methods doesn’t work for pandasdf.parquet. I also attached > python’s pip freeze file and Julia’s toml files just in case you want to see > my python and julia enviroments. > > As this is a very primitive test, I’m pretty sure I made some simple mistakes > here. What I’m missing? Let me know how I should handle parquet files from > interfaces in different languages. > > Thanks, > Kazu > >
smime.p7s
Description: S/MIME cryptographic signature
