Hey Chris,

Comments are below.

On 4/6/12 9:01 PM, "Mattmann, Chris A (388J)"
<[email protected]> wrote:

>Hi Mike,
>
>Thanks, what a great page!
>
>I noticed this comment in the page:
>
>"At the time of this writing, jobs that cannot be added to the queue
>disappear...."
>
>I think we should be more clear than "disappear". They don't disappear.
>The 
>Scheduler will try and send a Job to the BatchMgr, and if there is an
>exception,
>it tries to re-queue the Job back onto the JobStack. If it's unable to do
>that, then
>there is an issue, but it at the very least tries to re-queue the job if
>there was an
>issue. 

The reason this blurb was put into the wiki was because when Gabe and I
were looking through the Resource Manager code, this is what looks to be
happening. Check out the piece of code that tries to add a job:

In the JobStack.java:

public String addJob(JobSpec spec) throws JobQueueException {
  String jobId = safeAddJob(spec);
  if (queue.size() != maxQueueSize) {
    LOG.log(Level.INFO, "Added Job: [" + spec.getJob().getId() + "] to
queue");
    queue.add(spec);
    spec.getJob().setStatus(JobStatus.QUEUED);
    safeUpdateJob(spec);
    return jobId;
  } else
    throw new JobQueueException("Reached max queue size: [" + maxQueueSize
    + "]: Unable to add job: [" + spec.getJob().getId() + "]");
  }
}


When the JobQueueException gets thrown, the Resource Manager throws a
SchedulerException:

In the XmlRpcResourceManager.java:

private String genericHandleJob(Hashtable jobHash, Object jobIn) throws
SchedulerException {

...

  try {
    jobId = scheduler.getJobQueue().addJob(spec);
  } catch (JobQueueException e) {
    LOG.log(Level.WARNING, "JobQueue exception adding job: Message: " +
e.getMessage());
    throw new SchedulerException(e.getMessage());
  }
  return jobId;
    }


>From here, I can't see where the job gets re-queued if the max queue size
is reached. If this is true, I can certainly file a JIRA issue.

>
>Also, in general, you will have as many jobs queued in Resource Manager
>land
>as the size of that job stack. So we should probably note that.
>
>Great resource here, thanks for putting it together!
>
>Cheers,
>Chris

Okay, I've noted this in the wiki as well.

Cheers,
Mike

>
>On Apr 5, 2012, at 8:43 AM, Cayanan, Michael D (388J) wrote:
>
>> Hi all,
>> 
>> I recently added a page to the OODT wiki:
>> 
>> https://cwiki.apache.org/confluence/display/OODT/Workflow+Manager+Help
>> 
>> I ran into some issues with the Workflow hanging up and also Workflow
>>jobs being lost when trying to send them off to the Resource Manager and
>>just wanted to share what I learned and what to do if you run into these
>>issues as well.
>> 
>> Cheers,
>> Mike
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Senior Computer Scientist
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 171-266B, Mailstop: 171-246
>Email: [email protected]
>WWW:   http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Assistant Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Reply via email to