Hey Tim, We're still wiring up all the work needed for nested field refs in general (see ARROW-14658 [1]). And we haven't listed out what kinds of references we want to support. I would say we want to support things that Substrait supports [2] and the behavior you describe here appears to correspond to "masked complex expression" references there, that said, the way it ultimately gets implemented/exposed may be different.
For now, you will have to read the column and then postprocess it yourself (this will require you to manually decompose the ListArray/StructArray and reconstruct the ListArray - I can work out an example if that would help). By the way, thank you for the example here - it reminds me that we also likely should support pushing down the projection so that we only load the necessary leaf nodes in Parquet as well. [1]: https://issues.apache.org/jira/browse/ARROW-14658 [2]: https://substrait.io/expressions/field_references/#masked-complex-expression Best, David On Tue, Nov 9, 2021, at 15:45, Tim Nicolson wrote: > Hi, > > I have a parquet dataset containing "order" structs each of which has a list > of "item" structs. I would like to read a subset of the item structs. e.g. > > order_id: int64 > ...other fields... > items: list<item: struct<item_id: int64, price: int64, ...other fields...>> > > # is this/will this be possible? > dataset.to_table(columns=["order_id", "items.item_id", items.price"]) > > I guess they'd be lists of scalars rather than a list of structs with fewer > fields? > > I couldn't see any reference to *lists* in > https://github.com/apache/arrow/pull/11466. > > Is this possible or planned? Is there another way to achieve this? > > Thanks in advance, > > Tim
