Hi Shawn,

In March of 2021, when major work on the C++ query execution machinery
in Arrow was beginning, Wes sent a message [1] to the dev list and
linked to a doc [2] with some details about the planned design. A few
months later Neal sent an update [3] about this work. However those
documents are now somewhat out of date. More recently, Wes shared
another update [4] and linked to a doc [5] regarding task execution /
control flow / scheduling. However I think the best source of
information is the doc you linked to. The query execution work has
proceeded organically with many contributors, and efforts to document
the overall design in sufficient detail have not kept pace.

Regarding benchmarks: There has been extensive work done using
Conbench [6] as part of the Arrow CI infrastructure to benchmark
commits, for purposes of avoiding / identifying performance
regressions and measuring efforts to improve performance. However I am
not aware of any efforts to produce and publicly share benchmarks for
the purpose of comparing performance vs. other query engines.

There is a proposal [7] to give the name "Acero" to the Arrow C++
compute engine, so in the future you will likely see it referred to by
that name. I think that having a clearer name for this will motivate
more efforts to write and share more about it.

Ian

[1] https://lists.apache.org/thread/n632pmjnb85o49lyxy45f7sgh4cshoc0
[2] 
https://docs.google.com/document/d/1AyTdLU-RxA-Gsb9EsYnrQrmqPMOYMfPlWwxRi1Is1tQ/
[3] https://lists.apache.org/thread/3pmb592zmonz86nmmbjcw08j5tcrfzm1
[4] https://lists.apache.org/thread/ltllzpt1r2ch06mv1ngfgdl7wv2tm8xc
[5] 
https://docs.google.com/document/d/1216CUQZ7u4acZvC2jX7juqqQCXtdXMellk3lRrgP_WY/
[6] https://conbench.ursa.dev
[7] https://lists.apache.org/thread/7v7vkc005v9343n49b3shvrdn19wdpj1



On Mon, May 23, 2022 at 10:58 AM Shawn Yang <[email protected]> wrote:
>
> Hi, I'm considering using arrow compute as an execution kernel for our 
> distributed dataframe framework. I already read the great doc: 
> https://arrow.apache.org/docs/cpp/compute.html, but it is an usage doc. Is 
> there any design doc, inside introduction or benchmarks for arrow compute so 
> I can quickly understand how arrow compute works, what can be done and what 
> should be done by it? Or I should just read the code in 
> https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute
>
>

Reply via email to