Thank you for the suggestion! The problem was actually the RM was
using alot of memory (around 16 GB), but the node had only 12GB, so
the OS was killing the RM.

Thanks
Muntasir.

On Mon, Mar 18, 2013 at 1:04 PM, Robert Evans <[email protected]> wrote:
> How big is your heap?  There is an uncaught exception handler on almost
> all of the threads that if it catches an OOM it kills the process.  It
> tries to log that it caught the exception, but that does not always work,
> if the process is truly out of resources.  So you may not see anything
> except the process exiting.
>
> You could turn on GC statistics and look there.
>
> --Bobby
>
> On 3/18/13 11:34 AM, "Ravi Prakash" <[email protected]> wrote:
>
>>Muntasir,
>>
>>We have been running the RM with millions of jobs without it crashing.
>>Could you please attach the tail of the RM logs? Perhaps a megabyte of it
>>or so?
>>
>>Thanks
>>Ravi
>>
>>
>>
>>
>>________________________________
>> From: Sandy Ryza <[email protected]>
>>To: [email protected]
>>Sent: Monday, March 18, 2013 2:03 AM
>>Subject: Re: RM Suddenly gets killed
>>
>>There's no special signal I'm aware of other than an exception showing up
>>somewhere, probably near the end of the logs.  If this is occurring
>>consistently for you, filing a JIRA with steps to reproduce would be much
>>appreciated.
>>
>>-Sandy
>>
>>On Sun, Mar 17, 2013 at 7:18 PM, Muntasir Raihan Rahman <
>>[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am using the capacity scheduler.
>>>
>>> The only KILL messages I see in the yarn logs are related to killing
>>> containers. Is there any special signal I should look for in the logs
>>> that would indicate RM problems?
>>>
>>> Thanks
>>> Muntasir.
>>>
>>> On Sun, Mar 17, 2013 at 9:09 PM, Sandy Ryza <[email protected]>
>>> wrote:
>>> > Hi Muntasir,
>>> >
>>> > Do you know which scheduler you're using?  Does anything show up in
>>>your
>>> > resourcemanager logs?
>>> >
>>> > -Sandy
>>> >
>>> > On Sun, Mar 17, 2013 at 7:03 PM, Muntasir Raihan Rahman <
>>> > [email protected]> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I am using yarn 0.23 for some experiments. I am noticing that the the
>>> >> RM sometimes gets killed after a bunch of applications (around 70)
>>>are
>>> >> submitted.
>>> >>
>>> >> Are there any JIRA's related to this?
>>> >>
>>> >> Thanks
>>> >> Muntasir.
>>> >>
>>> >> --
>>> >> Best Regards
>>> >> Muntasir Raihan Rahman
>>> >> Email: [email protected]
>>> >> Phone: 1-217-979-9307
>>> >> Department of Computer Science,
>>> >> University of Illinois Urbana Champaign,
>>> >> 3111 Siebel Center,
>>> >> 201 N. Goodwin Avenue,
>>> >> Urbana, IL  61801
>>> >>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>> Muntasir Raihan Rahman
>>> Email: [email protected]
>>> Phone: 1-217-979-9307
>>> Department of Computer Science,
>>> University of Illinois Urbana Champaign,
>>> 3111 Siebel Center,
>>> 201 N. Goodwin Avenue,
>>> Urbana, IL  61801
>



-- 
Best Regards
Muntasir Raihan Rahman
Email: [email protected]
Phone: 1-217-979-9307
Department of Computer Science,
University of Illinois Urbana Champaign,
3111 Siebel Center,
201 N. Goodwin Avenue,
Urbana, IL  61801

Reply via email to