You can specify an explicit Arrow schema when converting a pandas.DataFrame to pyarrow.Table or RecordBatch. So it might be better to write out the schema you want (kind of like when you write the schema in SQL with CREATE TABLE ...) and then ensure that pandas objects are coerced into that?
On Mon, Jun 1, 2020 at 10:45 AM Sandy Ryza <[email protected]> wrote: > > Ah - I hadn't thought about how the object dtype complicates things: > > What I'm trying to do at a higher level is maybe wacky: > > I want a set of parquet files to be read/written by PySpark and Pandas > interchangeably. > For each file, I want to to specify, in code, the column types expected in > the file. > Before writing out a Pandas DataFrame to a file, I want to check whether it > matches the expected column types for the file. I don't need to provably > catch every violation, but the more I can catch, the better. > I'm considering using pyarrow types for expressing the expected column types > for each file. > > Does that make sense? Is there a different way you'd advise accomplishing > this? > > On 2020/05/30 15:07:05, Wes McKinney <[email protected]> wrote: > > I don't think there is specifically (one could be added in theory). Is> > > the goal to determine whether `pyarrow.array(pandas_object)` will> > > succeed or not, or something else? Since a lot of pandas data is> > > opaquely represented with object dtype it can be tricky unless you> > > want to go to the expense of using `pandas.lib.infer_dtype` to> > > determine the effective logical type of the values.> > > > > On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <[email protected]> wrote:> > > >> > > > Hi all,> > > >> > > > If I have a pandas dtype and an arrow type, is there a pyarrow API that > > > allows me to check whether the pandas dtype is convertible to the arrow > > > type?> > > >> > > > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work > > > in most cases, because pandas dtypes tend to be at least as wide as > > > equivalent arrow types, but I'm wondering whether there's something more > > > principled.> > > >> > > > Any help much appreciated,> > > > Sandy> > > >> > >
