Michael Wheeler created ARROW-6968: -------------------------------------- Summary: [Python] 0.14.1 to 0.15.0 upgrade produces AttributeError Key: ARROW-6968 URL: https://issues.apache.org/jira/browse/ARROW-6968 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.0 Environment: Python 3.7.4 on macOS Mojave 10.14.6 Python 3.6.7 on Ubuntu 16.04.6 LTS Reporter: Michael Wheeler Fix For: 0.15.0 Attachments: attribute_error_pyarrow_0_15_0.py
The code in question: {code:java} """ Reproduce AttributeError with PyArrow == 0.15.0 """ import io import logging import pandas import pyarrow import sys import textwrap logging.basicConfig(level=logging.DEBUG) logging.debug(f'Python v{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}') logging.debug(f'PyArrow v{pyarrow.__version__}' + '\n') CSV_TEXT = textwrap.dedent("""\ id,gender,some_date,age 001,M,01/01/2019,75 002,F,02/02/2018,32 003,M,03/03/2017,27 004,F,04/04/2016,19 005,M,05/05/2015,55 006,F,06/06/2014,42 """) # Initialize pyarrow table via pandas mock_file = io.StringIO(CSV_TEXT) df = pandas.read_csv(mock_file).sort_values(['age', 'gender']) table = pyarrow.Table.from_pandas(df=df) # This comprehension generates a map between the name of the column and its index map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns} logging.debug('The column indices are:') for name, index in map_col_names_to_incides.items(): logging.debug(f'Col {name} -> #{index}') {code} Expected result (generated with 0.14.0): {code:java} DEBUG:root:Python v3.7.4 DEBUG:root:PyArrow v0.14.1 DEBUG:root:The column indices are: DEBUG:root:Col id -> #0 DEBUG:root:Col gender -> #1 DEBUG:root:Col some_date -> #2 DEBUG:root:Col age -> #3 DEBUG:root:Col __index_level_0__ -> #4 {code} Actual result (generated with 0.15.0): {code:java} DEBUG:root:Python v3.7.4 DEBUG:root:PyArrow v0.15.0 Traceback (most recent call last): File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1758, in <module> main() File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1752, in main globals = debugger.run(setup['file'], None, None, is_module) File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1147, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", line 31, in <module> map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns} File "/Users/mwheeler/Library/Preferences/PyCharm2019.1/scratches/scratch.py", line 31, in <dictcomp> map_col_names_to_incides = {item.name: table.columns.index(item) for item in table.columns} AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'name' {code} This error occurs in both of the environments specified below. -- This message was sent by Atlassian Jira (v8.3.4#803005)