Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Manoj Kumar Tue, 20 Feb 2024 03:35:25 -0800

Dear @Chao Sun,

I trust you're doing well. Having worked extensively with Spark Nvidia
Rapids, Velox, and Gluten, I'm now contemplating Comet's potential
advantages over Velox in terms of performance and unique features.


While Rapids leverages GPUs effectively, Gazelle's Intel AVX512 intrinsics
which is now EOL. Now, all eyes are on Velox for its universal C++
accelerators(Presto, Spark, PyTorch, XStream (stream processing), F3
(feature engineering), FBETL (data ingestion), XSQL(distributed transaction
processing) , Scribe (message bus infrastructure), Saber (high QPS external
serving), and others...).

In this context, I'm keen to understand Comet's distinctive features and
how its performance compares to Velox. What makes Comet stand out, and how
does its efficiency stack up against Velox across different tasks and
frameworks?

Your insights into Comet's capabilities would be invaluable, it will help
me to evaluate why I should invest my time in this plugin.

Thank you for your time and expertise.

Warm regards,
Manoj Kumar


On Tue, 20 Feb 2024 at 01:51, Mich Talebzadeh <[email protected]>
wrote:

> Ok thanks for your clarifications
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Mon, 19 Feb 2024 at 17:24, Chao Sun <[email protected]> wrote:
>
>> Hi Mich,
>>
>> > Also have you got some benchmark results from your tests that you can
>> possibly share?
>>
>> We only have some partial benchmark results internally so far. Once
>> shuffle and better memory management have been introduced, we plan to
>> publish the benchmark results (at least TPC-H) in the repo.
>>
>> > Compared to standard Spark, what kind of performance gains can be
>> expected with Comet?
>>
>> Currently, users could benefit from Comet in a few areas:
>> - Parquet read: a few improvements have been made against reading from S3
>> in particular, so users can expect better scan performance in this scenario
>> - Hash aggregation
>> - Columnar shuffle
>> - Decimals (Java's BigDecimal is pretty slow)
>>
>> > Can one use Comet on k8s in conjunction with something like a Volcano
>> addon?
>>
>> I think so. Comet is mostly orthogonal to the Spark scheduler framework.
>>
>> Chao
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <
>> [email protected]> wrote:
>>
>>> Hi Chao,
>>>
>>> As a cool feature
>>>
>>>
>>>    - Compared to standard Spark, what kind of performance gains can be
>>>    expected with Comet?
>>>    -  Can one use Comet on k8s in conjunction with something like a
>>>    Volcano addon?
>>>
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge, sourced from both personal expertise and other resources but of
>>> course cannot be guaranteed . It is essential to note that, as with any
>>> advice, one verified and tested result holds more weight than a thousand
>>> expert opinions.
>>>
>>>
>>> On Tue, 13 Feb 2024 at 20:42, Chao Sun <[email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are very happy to announce that Project Comet, a plugin to
>>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>>> has now been open sourced under the Apache Arrow umbrella. Please
>>>> check the project repo
>>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>>> you are interested. We'd love to collaborate with people from the open
>>>> source community who share similar goals.
>>>>
>>>> Thanks,
>>>> Chao
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Reply via email to