On Mon, Aug 1, 2016 at 5:56 PM, Jestin Ma <jestinwith.a...@gmail.com> wrote: > Hi Nikolay, I'm looking at data locality improvements for Spark, and I have > conflicting sources on using YARN for Spark. > > Reynold said that Spark workers automatically take care of data locality > here: > https://www.quora.com/Does-Apache-Spark-take-care-of-data-locality-when-Spark-workers-load-data-from-HDFS > > However, I've read elsewhere > (https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/yarn/) > that Spark on YARN increases data locality because YARN tries to place tasks > next to HDFS blocks. > > Can anyone verify/support one side or the other?
Hi Jestin, I'm the author of the latter. I can't seem to find how Reynold "conflicts" with what I wrote in the notes? Could you elaborate? I certainly may be wrong. Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org