Richard Chycoski <rskiad...@chycoski.com> writes:
> Narayan Desai wrote:
>> On Thu, 16 Jul 2009 12:16:14 -0400 Doug Hughes wrote:
>>
>>   Doug> Narayan Desai wrote:
>>   Doug> > On Thu, 16 Jul 2009 11:15:48 -0400 Edward Ned Harvey wrote:
>>   Doug> >
>>   Doug> >   Ned> > I am interested in soliciting experiences deploying, 
>> using and
>>   Doug> >   Ned> > maintaining the
>>   Doug> >   Ned> > Condor batch processing system, especially under Linux / 
>> Debian.
>>   Doug> >   Ned> >   Ned> > Our use would predominantly be many small jobs,
>>   Doug> > rather than a few large
>>   Doug> >   Ned> > jobs,
>>   Doug> >   Ned> > with runtimes measured in a few hours.  Probably only a 
>> handful of
>>   Doug> >   Ned> > nodes, on
>>   Doug> >   Ned> > the order of half a dozen, in total.[1]
>>   Doug> >
>>   Doug> >
>>   Doug> >   Ned> I don't know anything about condor, or torque.  The obvious
>>   Doug> >   Ned> choice to me would be SGE.  I wonder what advantage there 
>> is to
>>   Doug> >   Ned> using something other than SGE?
>>   Doug> >
>>   Doug> > Well, the area where condor is pretty much the undisputed king is 
>> in the
>>   Doug> > scavenger arena. The basic idea is that you could deploy condor on 
>> top
>>   Doug> > of your regular desktops and jobs would be deployed to use wasted
>>   Doug> > cycles (during idle periods or on a set schedule, etc).  -nld
>>   Doug> >
>>   Doug> >   
>>   Doug> Doesn't it also excel at the whole state/migration thing? E.G. you 
>> can
>>   Doug> take a node out for maintenance and migrate a running job off to
>>   Doug> another node by saving the memory state and performing the migration
>>   Doug> and then resuming the job. (May only work for some job 
>> configurations)
>>
>> So I hear. I don't have any direct experience with the
>> checkpointing/migration stuff. I gather they are starting to use VMs for
>> this sort of thing as well as library-based checkpointing.
>
> This depends on the purpose of the batch jobs. If you're looking for simple
> load sharing/cloud computing, we've used LSF in our engineering environment
> for a long time.

Thanks.  With the number of recommendations I will definitely take a closer
look at the facilities and cost of LSF — though I fear that our budget won't
go that far, so a "free" starting point will be the solution.

> It has the option of consuming unused desktop cycles, but we found this to
> be unreliable and problematic - not because LSF was bad, but because
> individuals had messed around with their desktops in such a way as to mangle
> any jobs distributed to them.

*nod*  Even with Condor I would be looking to deploy on semi-dedicated server
hardware, not end-user machines, so while they may also have other load it
would be fairly predictable.

[...]

> I work in a group who's main purpose is to provide automation, especially
> for the batch processing environment at $WORK. You're welcome to ping me -
> here on the list or privately - if you would like more help.

Thank you; I appreciate the offer.  At this stage it looks likely that Condor
will be the tool of choice, and I will be looking to deploy a small trial
cluster in the near future.

At least this new environment adds variety and spice to the job. ;)

Regards,
        Daniel
-- 
✣ Daniel Pittman            ✉ dan...@rimspace.net            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons

_______________________________________________
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to