Joris Van den Bossche created ARROW-5286:
--------------------------------------------

             Summary: [Python] support Structs in Table.from_pandas given a 
known schema
                 Key: ARROW-5286
                 URL: https://issues.apache.org/jira/browse/ARROW-5286
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Joris Van den Bossche


ARROW-2073 implemented creating a StructArray from an array of tuples (in 
addition to from dicts). 
This works in {{pyarrow.array}} (specifying the proper type):

{code}
In [2]: df = pd.DataFrame({'tuples': [(1, 2), (3, 4)]})                         
                                                                              

In [3]: struct_type = pa.struct([('a', pa.int64()), ('b', pa.int64())])         
                                                                              

In [4]: pa.array(df['tuples'], type=struct_type)                                
                                                                              
Out[4]: 
<pyarrow.lib.StructArray object at 0x7f1b02ff6818>
-- is_valid: all not null
-- child 0 type: int64
  [
    1,
    3
  ]
-- child 1 type: int64
  [
    2,
    4
  ]
{code}

But does not yet work when converting a DataFrame to Table while specifying the 
type in a schema:

{code}
In [5]: pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))   
                                                                              
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
get_logical_type(arrow_type)
     68     try:
---> 69         return logical_type_map[arrow_type.id]
     70     except KeyError:

KeyError: 24

During handling of the above exception, another exception occurred:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-5-c18748f9b954> in <module>
----> 1 pa.Table.from_pandas(df, schema=pa.schema([('tuples', struct_type)]))

~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table.from_pandas()

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in dataframe_to_arrays(df, 
schema, preserve_index, nthreads, columns, safe)
    483     metadata = construct_metadata(df, column_names, index_columns,
    484                                   index_descriptors, preserve_index,
--> 485                                   types)
    486     return all_names, arrays, metadata
    487 

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in construct_metadata(df, 
column_names, index_levels, index_descriptors, preserve_index, types)
    207         metadata = get_column_metadata(df[col_name], 
name=sanitized_name,
    208                                        arrow_type=arrow_type,
--> 209                                        field_name=sanitized_name)
    210         column_metadata.append(metadata)
    211 

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
get_column_metadata(column, name, arrow_type, field_name)
    149     dict
    150     """
--> 151     logical_type = get_logical_type(arrow_type)
    152 
    153     string_dtype, extra_metadata = get_extension_dtype_info(column)

~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in 
get_logical_type(arrow_type)
     77         elif isinstance(arrow_type, pa.lib.Decimal128Type):
     78             return 'decimal'
---> 79         raise NotImplementedError(str(arrow_type))
     80 
     81 

NotImplementedError: struct<a: int64, b: int64>

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to