Re: Spark on YARN not utilizing all the YARN containers available

Dillon Dukek Tue, 09 Oct 2018 22:39:02 -0700

There is documentation here
http://spark.apache.org/docs/latest/running-on-yarn.html about running
spark on YARN. Like I said before you can use either the logs from the
application or the Spark UI to understand how many executors are running at
any given time. I don't think I can help much further without more
information about the specific use case.



On Tue, Oct 9, 2018 at 2:54 PM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi Dillon,
>
> I do think that there is a setting available where in once YARN sets up
> the containers then you do not deallocate them, I had used it previously in
> HIVE, and it just saves processing time in terms of allocating containers.
> That said I am still trying to understand how do we determine one YARN
> container = one executor in SPARK.
>
> Regards,
> Gourav
>
> On Tue, Oct 9, 2018 at 9:04 PM Dillon Dukek
> <dillon.du...@placed.com.invalid> wrote:
>
>> I'm still not sure exactly what you are meaning by saying that you have 6
>> yarn containers. Yarn should just be aware of the total available resources
>> in  your cluster and then be able to launch containers based on the
>> executor requirements you set when you submit your job. If you can, I think
>> it would be helpful to send me the command you're using to launch your
>> spark process. You should also be able to use the logs and/or the spark UI
>> to determine how many executors are running.
>>
>> On Tue, Oct 9, 2018 at 12:57 PM Gourav Sengupta <
>> gourav.sengu...@gmail.com> wrote:
>>
>>> hi,
>>>
>>> may be I am not quite clear in my head on this one. But how do we know
>>> that 1 yarn container = 1 executor?
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Tue, Oct 9, 2018 at 8:53 PM Dillon Dukek
>>> <dillon.du...@placed.com.invalid> wrote:
>>>
>>>> Can you send how you are launching your streaming process? Also what
>>>> environment is this cluster running in (EMR, GCP, self managed, etc)?
>>>>
>>>> On Tue, Oct 9, 2018 at 10:21 AM kant kodali <kanth...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am using Spark 2.3.1 and using YARN as a cluster manager.
>>>>>
>>>>> I currently got
>>>>>
>>>>> 1) 6 YARN containers(executors=6) with 4 executor cores for each
>>>>> container.
>>>>> 2) 6 Kafka partitions from one topic.
>>>>> 3) You can assume every other configuration is set to whatever the
>>>>> default values are.
>>>>>
>>>>> Spawned a Simple Streaming Query and I see all the tasks get scheduled
>>>>> on one YARN container. am I missing any config?
>>>>>
>>>>> Thanks!
>>>>>
>>>>

Re: Spark on YARN not utilizing all the YARN containers available

Reply via email to