Re: [c++][compute]Is there any other way to use Join besides Acero？

Weston Pace Tue, 20 Sep 2022 18:01:46 -0700

Thanks for the detailed reproducer.  I've added a few notes on the JIRA
that I hope will help.


On Tue, Sep 20, 2022, 5:10 AM 1057445597 <[email protected]> wrote:

> I re-uploaded a copy of the code that can be compiled and run in
> join_test.zip, including cmakelists.txt, the test data files and the Python
> code that generated the test files. There is also Python code to view the
> data files. You will need to compile Arrow 9.0 yourself.
>
> ------------------------------
> 1057445597
> [email protected]
>
> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "user" <[email protected]>;
> *发送时间:* 2022年9月15日(星期四) 晚上10:27
> *收件人:* "user"<[email protected]>;
> *主题:* 回复： [c++][compute]Is there any other way to use Join besides Acero？
>
> this jira
>
> https://issues.apache.org/jira/browse/ARROW-17740
> ------------------------------
> 1057445597
> [email protected]
>
> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=>
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "user" <[email protected]>;
> *发送时间:* 2022年9月15日(星期四) 中午12:15
> *收件人:* "user"<[email protected]>;
> *主题:* Re: [c++][compute]Is there any other way to use Join besides Acero？
>
> Within Arrow-C++ that is the only way I am aware of.  You might be able to
> use DuckDb.  It should be able to scan parquet files.
>
> Is this the same program that you shared before?  Were you able to figure
> out threading?  Can you create a JIRA with some sample input files and a
> reproducible example?
>
> On Wed, Sep 14, 2022 at 5:14 PM 1057445597 <[email protected]> wrote:
>
>> Acero performs poorly, and coredump occurs frequently！
>>
>> In the scenario I'm working on, I'll read one Parquet file and then
>> several other Parquet files. These files will have the same column name
>> (UUID). I need to join (by UUID), project (remove UUID), and filter (some
>> custom filtering) the results of the two reads. I found that Acero could
>> only be used to do join, but when I tested it, Acero performance was very
>> poor and very unstable, coredump often happened. Is there another way? Or
>> just another way to do a join!
>>
>>
>> ------------------------------
>> 1057445597
>> [email protected]
>>
>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=1057445597&icon=http%3A%2F%2Fthirdqq.qlogo.cn%2Fg%3Fb%3Dsdk%26k%3DIlyZtc5eQb1ZfPd0rzpQlQ%26s%3D100%26t%3D1551800738%3Frand%3D1648208978&mail=1057445597%40qq.com&code=>
>>
>>
>

Re: [c++][compute]Is there any other way to use Join besides Acero？

Reply via email to