| Hello, I’m a radio astronomer working for the Event Horizon Telescope project. We are interested in Apache Arrow for our next-generation data format as other radio astronomy groups started to develop a new Arrow-based data format. We are currently developing major software ecosystems in Julia and Python, and would like to test data IO interfaces with Arrow.jl and pyarrow. I’m writing this e-mail because I faced some issues in loading Arrow table data created in a different language. We just did a very simple check like creating Arrow tables in python and Julia, and loading them in another language (i.e. Julia and Python respectively). While we confirmed that each of pyarrow and Arrow.jl can read parquet files generated from itself, it can’t load parquet files from another language. For instance, we found
I have attached Julia and python scripts that create parquet files of a very simple single-column table (juliadf.parquet from julia, pandasdf.parquet from python). pyarrow.parquet.read_table doesn’t work for juliadf.parquet, and Arrow.Table methods doesn’t work for pandasdf.parquet. I also attached python’s pip freeze file and Julia’s toml files just in case you want to see my python and julia enviroments. As this is a very primitive test, I’m pretty sure I made some simple mistakes here. What I’m missing? Let me know how I should handle parquet files from interfaces in different languages. Thanks, Kazu |
create_parquet.jl
Description: Binary data
import pandas as pd import numpy as np import pyarrow.parquet as pq # Create a simple dataframe df = pd.DataFrame() df["col1"] = np.zeros(10) df
# save to a parquet file
df.to_parquet("pandasdf.parquet")
# load with pyarrow
atab = pq.read_table("pandasdf.parquet")
atab
pip_freeze
Description: Binary data
Manifest.toml
Description: Binary data
Project.toml
Description: Binary data
smime.p7s
Description: S/MIME cryptographic signature
