Re: Substrait consumer for custom data sources

2022-09-27 Thread Li Jin
Thanks both. I think NamedTableProvider is close to what I want, and like Weston said, the tricky bit is how to use a custom NamedTableProvider when calling the pyarrow substrait API. It's a little hacky but I *think* I can override the value "kDefaultNamedTableProvider" here and pass "table_provi

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-27 Thread David Li
Hi all, Just a reminder that this vote is still outstanding - if anyone (PMC or not) is interested in taking a look. Thanks! -David On Thu, Sep 22, 2022, at 05:40, Antoine Pitrou wrote: > Hello, > > I would urge people to review the proposed ADBC APIs, especially the Go > and Java APIs which p

Re: [Discuss] Deprecating Plasma

2022-09-27 Thread Antoine Pitrou
Ok, I've filed https://issues.apache.org/jira/browse/ARROW-17860 for this. Regards Antoine. Le 22/09/2022 à 17:38, Antoine Pitrou a écrit : Hello, The Plasma object store (*) hasn't received significant maintenance since at least 2020. The original authors have stopped contributing to the

Re: Substrait consumer for custom data sources

2022-09-27 Thread Benjamin Kietzman
It seems to me that your use case could be handled by defining a custom NamedTableProvider and assigning this to ConversionOptions::named_table_provider. This was added in https://github.com/apache/arrow/pull/13613 to provide user configurable dispatching for named tables; if it doesn't address you

Re: Substrait consumer for custom data sources

2022-09-27 Thread Weston Pace
In pyarrow it is "string(s) -> arrow Table". However, in the actual C++ (e.g. relation_internal.cc) it is already "string(s) -> compute::Declaration" which should be sufficiently general for your needs. A "compute::Declaration" is a combination of node factory name and node options so you should

Re: Substrait consumer for custom data sources

2022-09-27 Thread Li Jin
I did some more digging into this and have some ideas - Currently, the logic for deserialization named table is: https://github.com/apache/arrow/blob/master/cpp/src/arrow/engine/substrait/relation_internal.cc#L129 and it will look up named tables from a user provided dictionary from string -> arro