[ https://issues.apache.org/jira/browse/ARROW-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-11095: ----------------------------------- Labels: pull-request-available (was: ) > [Python] Access pyarrow.RecordBatch column by name > -------------------------------------------------- > > Key: ARROW-11095 > URL: https://issues.apache.org/jira/browse/ARROW-11095 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Will Jones > Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I propose adding support for selecting a column out of a pyarrow.RecordBatch > using both __getitem__() and .field(), like we have in pyarrow.Table. > pyarrow.RecordBatch has a pretty similar API to pyarrow.Table (e.g. both have > filter and take methods and a schema), but I got tripped up on this > difference. pyarrow.Table supports accessing columns by name using both > __getitem__ and .field(): > {code:python} > my_array = pa.array(range(10)) > table = pa.Table.from_arrays([my_array], names=['my_column']) > // Both of these work on table: > table['my_column'] > table.field('my_column') > {code} > Meanwhile pyarrow.RecordBatch doesn't support either of those. In fact, I had > a hard time finding a way to grab a column by name from a recordbatch without > first looking up the integer index. -- This message was sent by Atlassian Jira (v8.3.4#803005)