Thanks Gourav for the info. Actually I am looking for concrete
experiences and detailed best practices from people who have build their
own GPU-powered environment instead of relying on big cloud providers
who are dominating and trying to monopolize the data science market....
-- ND
On 7/30/21 4:37 AM, Gourav Sengupta wrote:
Hi,
there are no cons of using SPARK with GPU's you just have to be
careful about the GPU memory and a few other details.
I have seen sometimes 10x improvement over general SPARK 3.x
performance and sometimes around 30x.
Not all the queries will be performant with GPU's and it is up to you
to test out scenarios specific to you. I use EMR for this option and
it is really impressive what NVIDIA folks have done.
I think, there was an initial promise with SPARK 3.x release that
SPARK dataframes can be transferred directly through native
integration to tensorflow and others, which is a brilliant way
forward for SPARK, but I think that SPARK project leaders are yet to
prioritise it.
Also Ray, another project by Berkeley, is trying to make SPARK
dataframes transfer to tensorflow. Clearly if SPARK users use Ray to
transfer SPARK dataframes to tensorflow and other frameworks, then
obviously Ray will have massive adoption.
Personally I think that SPARK community could have just built the
integration with other frameworks natively given the fantastic
contributions by NVIDIA to SPARK and such a large active development
community, but surely Ray also has to win as well and nothing better
than to ride on the success of SPARK. But I may be wrong, and SPARK
community may still be developing those integrations.
Regards,
Gourav Sengupta
On Fri, Jul 30, 2021 at 2:46 AM Artemis User <arte...@dtechspace.com
<mailto:arte...@dtechspace.com>> wrote:
Has anyone had any experience with running Spark-Rapids on a
GPU-powered cluster (https://github.com/NVIDIA/spark-rapids
<https://github.com/NVIDIA/spark-rapids>)? I am very interested in
knowing:
1. What is the hardware/software platform and the type of Spark
cluster you are using to run Spark-Rapids?
2. How easy was the installation process?
3. Are you running Scala or PySpark or both with Spark-Rapids?
4. Have performance you've seen compared with running a CPU-only
cluster?
5. Any pros/cons of using Spark-Rapids?
Thanks a lot in advance!
-- ND