atherkevin opened a new issue, #15133:
URL: https://github.com/apache/arrow/issues/15133

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hello,
   
   I'm running into an issue using the pandas to_feather() call with pyarrow. 
Pyarrow seems to ignore the data types in the dataframe, and instead tries to 
convert objects to it's own data type, but then fails on the conversion if the 
column has mixed data types. I'm running python 3.11.1, pandas 1.5.2, numpy 
1.24.1, and pyarrow 10.0.1, and have reproduced in a docker image of python 
3.11.1-slim and on mac M1. 
   
   `df = pd.DataFrame(({
               'a': ['a', 'a', 'a', 'b', 'b', 'b', 'c'],
               'b': ['c', 'c', None, 'd', 'd', 'd', 'e'],
               'c': [1.5, 2.0, 3.5, 5.0, 8.0, 10.0, 'a string'],
               'd': [1, 2, 3, 1, 2, 3, 1],
               'a1': [1, 1, 1, 1, 1, 1, 1],  # junk columns to test return df
               'a2': [2, 2, 2, 2, 2, 2, 2],  # junk columns to test return df
           }))
           df.to_feather('file')`
   
   Pandas handles this fine, and will say the dtype of column 'C' is either 
object or string. When attempting to save as a feather file, the following 
stack trace happens:
   
   > Error
   > Traceback (most recent call last):
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/tests/test_pyarrow_file_failure.py",
 line 16, in test_error_conversion
   >     df.to_feather('file')
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pandas/util/_decorators.py",
 line 211, in wrapper
   >     return func(*args, **kwargs)
   >            ^^^^^^^^^^^^^^^^^^^^^
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pandas/core/frame.py",
 line 2794, in to_feather
   >     to_feather(self, path, **kwargs)
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pandas/io/feather_format.py",
 line 93, in to_feather
   >     feather.write_feather(df, handles.handle, **kwargs)
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/feather.py",
 line 164, in write_feather
   >     table = Table.from_pandas(df, preserve_index=preserve_index)
   >             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   >   File "pyarrow/table.pxi", line 3475, in pyarrow.lib.Table.from_pandas
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py",
 line 611, in dataframe_to_arrays
   >     arrays = [convert_column(c, f)
   >              ^^^^^^^^^^^^^^^^^^^^^
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py",
 line 611, in <listcomp>
   >     arrays = [convert_column(c, f)
   >               ^^^^^^^^^^^^^^^^^^^^
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py",
 line 598, in convert_column
   >     raise e
   >   File 
"/Users/kevin/PycharmProjects/analyzethatv1/venv/lib/python3.11/site-packages/pyarrow/pandas_compat.py",
 line 592, in convert_column
   >     result = pa.array(col, type=type_, from_pandas=True, safe=safe)
   >              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   >   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
   >   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
   >   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   > pyarrow.lib.ArrowInvalid: ("Could not convert 'a string' with type str: 
tried to convert to double", 'Conversion failed for column c with type object')
   
   Questions- why would this not obey the pandas dtype? I can appreciate 
attempting the conversion, but if the conversion fails, why not maintain the 
pandas dtype for saving? Is there some option I'm missing?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to