Hi everyone,
We published an article on the performance and correctness of Trino, Spark,
and Hive-MR3, and thought that it could be of interest to Spark users.
https://www.datamonad.com/post/2023-05-31-trino-spark-hive-performance-1.7/
Omitted in the article is the performance of Spark 2.3.1 vs
Hi Leszek,
For running YARN on Kubernetes and then running Spark on YARN, is there a
lot of overhead for maintaining YARN on Kubernetes? I thought people
usually want to move from YARN to Kubernetes because of the overhead of
maintaining Hadoop.
Thanks,
--- Sungwoo
On Fri, Sep 30, 2022 at
n Wed, Sep 7, 2022 at 5:49 PM Sungwoo Park wrote:
>
>> You are right -- Spark can't do this with its current architecture. My
>> question was: if there was a new implementation supporting pipelined
>> execution, what kind of Spark jobs would benefit (a lot) from it?
>>
>
com LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
>
>
> On Wed, Sep 7, 2022 at 7:42 AM Sungwoo Park wrote:
>
>> Hello Spark users,
>>
>> I have a question on the architecture of Spark (which could lead to a
>&
Hello Spark users,
I have a question on the architecture of Spark (which could lead to a
research problem). In its current implementation, Spark finishes executing
all the tasks in a stage before proceeding to child stages. For example,
given a two-stage map-reduce DAG, Spark finishes executing
For 1), this is a recurring question in this mailing list, and the answer
is: no, Spark does not support the coordination between multiple Spark
applications. Spark relies on an external resource manager, such as Yarn
and Kubernetes, to allocate resources to multiple Spark applications. For
The problem you describe is the motivation for developing Spark on MR3.
>From the blog article (https://www.datamonad.com/post/2021-08-18-spark-mr3/
):
*The main motivation for developing Spark on MR3 is to allow multiple Spark
applications to share compute resources such as Yarn containers or
/comparison-llap/
Thanks,
-- SW
On Sat, Apr 2, 2022 at 9:58 PM Bitfox wrote:
> Nice reading. Can you give a comparison on Hive on MR3 and Hive on Tez?
>
> Thanks
>
> On Sat, Apr 2, 2022 at 7:17 PM Sungwoo Park wrote:
>
>> Hi Spark users,
>>
>> We have pu
Hi Spark users,
We have published an article where we evaluate the performance of Spark
2.3.8 and Spark 3.2.1 (along with Hive 3). If interested, please see:
https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/
--- SW
Hi Spark users,
We would like to announce the release of Spark on MR3, which is Apache
Spark using MR3 as the execution backend. MR3 is a general purpose
execution engine for Hadoop and Kubernetes, and Hive on MR3 has been its
main application. Spark on MR3 is a new application of MR3.
The main
10 matches
Mail list logo