Hey Chris and Brian,

I filed a JIRA issue for this:

https://issues.apache.org/jira/browse/OODT-439


So for the wiki page that I just created, should I just reference this
JIRA issue on the page so that users know that this is a work around
(setting the queue size of the resource manager)? Or should I remove it
and document the work around with the JIRA issue as Brian has suggested?
I'm okay with either solution.

Thanks,
Mike

On 4/10/12 8:19 PM, "Mattmann, Chris A (388J)"
<[email protected]> wrote:

>Hey BFost,
>
>Totally agreed here, and with Mike on it. This is an issue that we need
>to fix. Thanks to Mike and others for taking the time to document this,
>and I am +1 with Brian that along with the documentation, we should
>probably think of a strategy to fix this and implement it in 0.5. Mike,
>I think you offered to file a JIRA issue -- that offer still stand? :)
>
>Thanks!
>
>Cheers,
>Chris
>
>On Apr 10, 2012, at 10:58 AM, Brian Foster wrote:
>
>> hey chris,
>> 
>> i believe mike is talking about the following case:
>> 
>> 1) queue is full
>> 2) scheduler pops job from queue and beginnings trying to find a node
>>for job
>> 3) queue now has 1 open slot
>> 4) another job is given to the resource manager and is placed in the
>>queue
>> 5) queue is now full again
>> 6) scheduler fails to schedule popped job
>> 7) scheduler pushs job back into the queue
>> 8) queue is full so exception is thrown and job is lost
>> 
>> -brian
>> 
>> On Apr 10, 2012, at 07:08 AM, "Mattmann, Chris A (388J)"
>><[email protected]> wrote:
>> 
>>> Hi Mike,
>>> 
>>> On Apr 9, 2012, at 9:12 AM, Cayanan, Michael D (388J) wrote:
>>> 
>>> > Hey Chris,
>>> > 
>>> > Comments are below.
>>> >> 
>>> >> "At the time of this writing, jobs that cannot be added to the queue
>>> >> disappear...."
>>> >> 
>>> >> I think we should be more clear than "disappear". They don't
>>>disappear.
>>> >> The 
>>> >> Scheduler will try and send a Job to the BatchMgr, and if there is
>>>an
>>> >> exception,
>>> >> it tries to re-queue the Job back onto the JobStack. If it's unable
>>>to do
>>> >> that, then
>>> >> there is an issue, but it at the very least tries to re-queue the
>>>job if
>>> >> there was an
>>> >> issue. 
>>> > 
>>> > The reason this blurb was put into the wiki was because when Gabe
>>>and I
>>> > were looking through the Resource Manager code, this is what looks
>>>to be
>>> > happening. Check out the piece of code that tries to add a job:
>>> 
>>> Reaching Max queue size is different than saying that jobs that cannot
>>>be
>>> added to the queue disappear. I think we should explicitly state:
>>> 
>>> "At the time of this writing, when then queue has reached the max
>>>queue 
>>> size, a message is logged by the Scheduler saying there is a Job Queue
>>> Exception adding a job to the queue, and then the Job is dropped."
>>> 
>>> I think that's more accurate based on your code walk. I was thinking
>>>based on
>>> your above message that you were talking about Jobs that couldn't be
>>> Scheduled for whatever reason (e.g., the Batch Mgr being down, or a
>>> Batch Stub being down) in which case they are re-queued.
>>> 
>>> Cheers,
>>> Chris
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: [email protected]
>>> WWW: http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: [email protected]
>WWW:   http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Reply via email to