Re: Spark Job Run Resource Estimation ?

๏̯͡๏ Thu, 09 Apr 2015 10:14:01 -0700

Thanks Sandy, apprechiate

On Thu, Apr 9, 2015 at 10:32 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:


> Hi Deepak,
>
> I'm going to shamelessly plug my blog post on tuning Spark:
>
> http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
>
> It talks about tuning executor size as well as how the number of tasks for
> a stage is calculated.
>
> -Sandy
>
> On Thu, Apr 9, 2015 at 9:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> wrote:
>
>> I have a spark job that has multiple stages. For now i star it with 100
>> executors, each with 12G mem (max is 16G). I am using Spark 1.3 over YARN
>> 2.4.x.
>>
>> For now i start the Spark Job with a very limited input (1 file of size
>> 2G), overall there are 200 files. My first run is yet to complete as its
>> taking too much of time / throwing OOM exceptions / buffer exceptions (keep
>> that aside).
>>
>> How will i know how much resources are required to run this job. (# of
>> cores, executors, mem, serialization buffers, and i do not yet what else).
>>
>> IN M/R world, all i do is set split size and rest is taken care
>> automatically (yes i need to worry about mem, in case of OOM).
>>
>>
>> 1) Can someone explain how they do resource estimation before running the
>> job or is there no way and one needs to only try it out ?
>> 2) Even if i give 100 executors, the first stage takes only 5, how did
>> spark decide this ?
>>
>> Please point me to any resources that also talks about similar things or
>> please explain here.
>>
>> --
>> Deepak
>>
>>
>


-- 
Deepak

Re: Spark Job Run Resource Estimation ?

Reply via email to