Re: Tuning level of Parallelism: Increase or decrease?

Jacek Laskowski Tue, 02 Aug 2016 15:55:03 -0700

On Mon, Aug 1, 2016 at 5:56 PM, Jestin Ma <jestinwith.a...@gmail.com> wrote:
> Hi Nikolay, I'm looking at data locality improvements for Spark, and I have
> conflicting sources on using YARN for Spark.
>
> Reynold said that Spark workers automatically take care of data locality
> here:
> https://www.quora.com/Does-Apache-Spark-take-care-of-data-locality-when-Spark-workers-load-data-from-HDFS
>
> However, I've read elsewhere
> (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/yarn/)
> that Spark on YARN increases data locality because YARN tries to place tasks
> next to HDFS blocks.
>
> Can anyone verify/support one side or the other?


Hi Jestin,

I'm the author of the latter. I can't seem to find how Reynold
"conflicts" with what I wrote in the notes? Could you elaborate?

I certainly may be wrong.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Tuning level of Parallelism: Increase or decrease?

Reply via email to