Hi,
You can give pycylon a try [1]. It has a similar API endpoint in
pycylon.dataframe interface [2].

Best

[1] https://github.com/cylondata/cylon
[2]
https://github.com/cylondata/cylon/blob/main/python/pycylon/examples/dataframe/join.py


On Thu, Sep 15, 2022 at 10:04 AM 1057445597 <[email protected]> wrote:

> Is there a same interface in c++?
>
> ------------------------------
> 1057445597
> [email protected]
>
> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "user" <[email protected]>;
> *发送时间:* 2022年9月15日(星期四) 晚上9:47
> *收件人:* "user"<[email protected]>;
> *主题:* Re: [c++][compute]Is there any other way to use Join besides Acero?
>
> Hi!
>
> Why don't you use arrow Table join directly ?
>
>
> https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.join
>
> Though you need to be careful with join order as speed may be differ
> depending on order of the joined tables.
>
> BR,
>
> Jacek
>
>
> czw., 15 wrz 2022 o 06:15 Weston Pace <[email protected]> napisał(a):
>
>> Within Arrow-C++ that is the only way I am aware of.  You might be able
>> to use DuckDb.  It should be able to scan parquet files.
>>
>> Is this the same program that you shared before?  Were you able to figure
>> out threading?  Can you create a JIRA with some sample input files and a
>> reproducible example?
>>
>> On Wed, Sep 14, 2022 at 5:14 PM 1057445597 <[email protected]> wrote:
>>
>>> Acero performs poorly, and coredump occurs frequently!
>>>
>>> In the scenario I'm working on, I'll read one Parquet file and then
>>> several other Parquet files. These files will have the same column name
>>> (UUID). I need to join (by UUID), project (remove UUID), and filter (some
>>> custom filtering) the results of the two reads. I found that Acero could
>>> only be used to do join, but when I tested it, Acero performance was very
>>> poor and very unstable, coredump often happened. Is there another way? Or
>>> just another way to do a join!
>>>
>>>
>>> ------------------------------
>>> 1057445597
>>> [email protected]
>>>
>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=>
>>>
>>>
>>

-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to