Hi, You can give pycylon a try [1]. It has a similar API endpoint in pycylon.dataframe interface [2].
Best [1] https://github.com/cylondata/cylon [2] https://github.com/cylondata/cylon/blob/main/python/pycylon/examples/dataframe/join.py On Thu, Sep 15, 2022 at 10:04 AM 1057445597 <[email protected]> wrote: > Is there a same interface in c++? > > ------------------------------ > 1057445597 > [email protected] > > <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=> > > > > ------------------ 原始邮件 ------------------ > *发件人:* "user" <[email protected]>; > *发送时间:* 2022年9月15日(星期四) 晚上9:47 > *收件人:* "user"<[email protected]>; > *主题:* Re: [c++][compute]Is there any other way to use Join besides Acero? > > Hi! > > Why don't you use arrow Table join directly ? > > > https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.join > > Though you need to be careful with join order as speed may be differ > depending on order of the joined tables. > > BR, > > Jacek > > > czw., 15 wrz 2022 o 06:15 Weston Pace <[email protected]> napisał(a): > >> Within Arrow-C++ that is the only way I am aware of. You might be able >> to use DuckDb. It should be able to scan parquet files. >> >> Is this the same program that you shared before? Were you able to figure >> out threading? Can you create a JIRA with some sample input files and a >> reproducible example? >> >> On Wed, Sep 14, 2022 at 5:14 PM 1057445597 <[email protected]> wrote: >> >>> Acero performs poorly, and coredump occurs frequently! >>> >>> In the scenario I'm working on, I'll read one Parquet file and then >>> several other Parquet files. These files will have the same column name >>> (UUID). I need to join (by UUID), project (remove UUID), and filter (some >>> custom filtering) the results of the two reads. I found that Acero could >>> only be used to do join, but when I tested it, Acero performance was very >>> poor and very unstable, coredump often happened. Is there another way? Or >>> just another way to do a join! >>> >>> >>> ------------------------------ >>> 1057445597 >>> [email protected] >>> >>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=> >>> >>> >> -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>
