Thanks Gourav for the info.  Actually I am looking for concrete experiences and detailed best practices from people who have build their own GPU-powered environment instead of relying on big cloud providers who are dominating and trying to monopolize the data science market....

-- ND

On 7/30/21 4:37 AM, Gourav Sengupta wrote:
Hi,

there are no cons of using SPARK with GPU's you just have to be careful about the GPU memory and a few other details.

I have seen sometimes 10x improvement over general SPARK 3.x performance and sometimes around 30x.

Not all the queries will be performant with GPU's and it is up to you to test out scenarios specific to you. I use EMR for this option and it is really impressive what NVIDIA folks have done.

I think, there was an initial promise with SPARK 3.x release that SPARK dataframes can be transferred directly through native integration to  tensorflow and others, which is a brilliant way forward for SPARK, but I think that SPARK project leaders are yet to prioritise it.

Also Ray, another project by Berkeley, is trying to make SPARK dataframes transfer to tensorflow. Clearly if SPARK users use Ray to transfer SPARK dataframes to tensorflow and other frameworks, then obviously Ray will have massive adoption.

Personally I think that SPARK community could have just built the integration with other frameworks natively given the fantastic contributions by NVIDIA to SPARK and such a large active development community, but surely Ray also has to win as well and nothing better than to ride on the success of SPARK. But I may be wrong, and SPARK community may still be developing those integrations.


Regards,
Gourav Sengupta


On Fri, Jul 30, 2021 at 2:46 AM Artemis User <arte...@dtechspace.com <mailto:arte...@dtechspace.com>> wrote:

    Has anyone had any experience with running Spark-Rapids on a
    GPU-powered cluster (https://github.com/NVIDIA/spark-rapids
    <https://github.com/NVIDIA/spark-rapids>)? I am very interested in
    knowing:

     1. What is the hardware/software platform and the type of Spark
        cluster you are using to run Spark-Rapids?
     2. How easy was the installation process?
     3. Are you running Scala or PySpark or both with Spark-Rapids?
     4. Have performance you've seen compared with running a CPU-only
        cluster?
     5. Any pros/cons of using Spark-Rapids?

    Thanks a lot in advance!

    -- ND


Reply via email to