David,

Thanks for the info - glad that this feature is in the pipeline!

I'd really appreciate some pointers on how to efficiently decompose the
ListArray/StructArray - happy to flesh it out and come back with an example
for posterity...

Thanks again,

Tim

On Wed, Nov 10, 2021 at 5:20 PM David Li <[email protected]> wrote:

> Hey Tim,
>
> We're still wiring up all the work needed for nested field refs in general
> (see ARROW-14658 [1]). And we haven't listed out what kinds of references
> we want to support. I would say we want to support things that Substrait
> supports [2] and the behavior you describe here appears to correspond to
> "masked complex expression" references there, that said, the way it
> ultimately gets implemented/exposed may be different.
>
> For now, you will have to read the column and then postprocess it yourself
> (this will require you to manually decompose the ListArray/StructArray and
> reconstruct the ListArray - I can work out an example if that would help).
>
> By the way, thank you for the example here - it reminds me that we also
> likely should support pushing down the projection so that we only load the
> necessary leaf nodes in Parquet as well.
>
> [1]: https://issues.apache.org/jira/browse/ARROW-14658
> [2]:
> https://substrait.io/expressions/field_references/#masked-complex-expression
>
> Best,
> David
>
> On Tue, Nov 9, 2021, at 15:45, Tim Nicolson wrote:
>
> Hi,
>
> I have a parquet dataset containing "order" structs each of which has a
> list of "item" structs.  I would like to read a subset of the item structs.
> e.g.
>
> order_id: int64
>
> ...other fields...
>
> items: list<item: struct<item_id: int64, price: int64, ...other fields...>>
>
>
> # is this/will this be possible?
>
> dataset.to_table(columns=["order_id", "items.item_id", items.price"])
>
>
> I guess they'd be lists of scalars rather than a list of structs with
> fewer fields?
>
> I couldn't see any reference to *lists* in
> https://github.com/apache/arrow/pull/11466.
>
> Is this possible or planned?  Is there another way to achieve this?
>
> Thanks in advance,
>
> Tim
>
>
>

Reply via email to