[jira] [Updated] (ARROW-11095) [Python] Access pyarrow.RecordBatch column by name

ASF GitHub Bot (Jira) Fri, 01 Jan 2021 14:47:05 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated ARROW-11095:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Python] Access pyarrow.RecordBatch column by name
> --------------------------------------------------
>
>                 Key: ARROW-11095
>                 URL: https://issues.apache.org/jira/browse/ARROW-11095
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Will Jones
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I propose adding support for selecting a column out of a pyarrow.RecordBatch 
> using both __getitem__() and .field(), like we have in pyarrow.Table.
> pyarrow.RecordBatch has a pretty similar API to pyarrow.Table (e.g. both have 
> filter and take methods and a schema), but I got tripped up on this 
> difference. pyarrow.Table supports accessing columns by name using both 
> __getitem__ and .field():
> {code:python}
> my_array = pa.array(range(10))
> table = pa.Table.from_arrays([my_array], names=['my_column'])
> // Both of these work on table:
> table['my_column']
> table.field('my_column')
> {code}
> Meanwhile pyarrow.RecordBatch doesn't support either of those. In fact, I had 
> a hard time finding a way to grab a column by name from a recordbatch without 
> first looking up the integer index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11095) [Python] Access pyarrow.RecordBatch column by name

Reply via email to