Thanks Sandy, apprechiate On Thu, Apr 9, 2015 at 10:32 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:
> Hi Deepak, > > I'm going to shamelessly plug my blog post on tuning Spark: > > http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ > > It talks about tuning executor size as well as how the number of tasks for > a stage is calculated. > > -Sandy > > On Thu, Apr 9, 2015 at 9:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote: > >> I have a spark job that has multiple stages. For now i star it with 100 >> executors, each with 12G mem (max is 16G). I am using Spark 1.3 over YARN >> 2.4.x. >> >> For now i start the Spark Job with a very limited input (1 file of size >> 2G), overall there are 200 files. My first run is yet to complete as its >> taking too much of time / throwing OOM exceptions / buffer exceptions (keep >> that aside). >> >> How will i know how much resources are required to run this job. (# of >> cores, executors, mem, serialization buffers, and i do not yet what else). >> >> IN M/R world, all i do is set split size and rest is taken care >> automatically (yes i need to worry about mem, in case of OOM). >> >> >> 1) Can someone explain how they do resource estimation before running the >> job or is there no way and one needs to only try it out ? >> 2) Even if i give 100 executors, the first stage takes only 5, how did >> spark decide this ? >> >> Please point me to any resources that also talks about similar things or >> please explain here. >> >> -- >> Deepak >> >> > -- Deepak