Richard Chycoski <rskiad...@chycoski.com> writes: > Narayan Desai wrote: >> On Thu, 16 Jul 2009 12:16:14 -0400 Doug Hughes wrote: >> >> Doug> Narayan Desai wrote: >> Doug> > On Thu, 16 Jul 2009 11:15:48 -0400 Edward Ned Harvey wrote: >> Doug> > >> Doug> > Ned> > I am interested in soliciting experiences deploying, >> using and >> Doug> > Ned> > maintaining the >> Doug> > Ned> > Condor batch processing system, especially under Linux / >> Debian. >> Doug> > Ned> > Ned> > Our use would predominantly be many small jobs, >> Doug> > rather than a few large >> Doug> > Ned> > jobs, >> Doug> > Ned> > with runtimes measured in a few hours. Probably only a >> handful of >> Doug> > Ned> > nodes, on >> Doug> > Ned> > the order of half a dozen, in total.[1] >> Doug> > >> Doug> > >> Doug> > Ned> I don't know anything about condor, or torque. The obvious >> Doug> > Ned> choice to me would be SGE. I wonder what advantage there >> is to >> Doug> > Ned> using something other than SGE? >> Doug> > >> Doug> > Well, the area where condor is pretty much the undisputed king is >> in the >> Doug> > scavenger arena. The basic idea is that you could deploy condor on >> top >> Doug> > of your regular desktops and jobs would be deployed to use wasted >> Doug> > cycles (during idle periods or on a set schedule, etc). -nld >> Doug> > >> Doug> > >> Doug> Doesn't it also excel at the whole state/migration thing? E.G. you >> can >> Doug> take a node out for maintenance and migrate a running job off to >> Doug> another node by saving the memory state and performing the migration >> Doug> and then resuming the job. (May only work for some job >> configurations) >> >> So I hear. I don't have any direct experience with the >> checkpointing/migration stuff. I gather they are starting to use VMs for >> this sort of thing as well as library-based checkpointing. > > This depends on the purpose of the batch jobs. If you're looking for simple > load sharing/cloud computing, we've used LSF in our engineering environment > for a long time.
Thanks. With the number of recommendations I will definitely take a closer look at the facilities and cost of LSF — though I fear that our budget won't go that far, so a "free" starting point will be the solution. > It has the option of consuming unused desktop cycles, but we found this to > be unreliable and problematic - not because LSF was bad, but because > individuals had messed around with their desktops in such a way as to mangle > any jobs distributed to them. *nod* Even with Condor I would be looking to deploy on semi-dedicated server hardware, not end-user machines, so while they may also have other load it would be fairly predictable. [...] > I work in a group who's main purpose is to provide automation, especially > for the batch processing environment at $WORK. You're welcome to ping me - > here on the list or privately - if you would like more help. Thank you; I appreciate the offer. At this stage it looks likely that Condor will be the tool of choice, and I will be looking to deploy a small trial cluster in the near future. At least this new environment adds variety and spice to the job. ;) Regards, Daniel -- ✣ Daniel Pittman ✉ dan...@rimspace.net ☎ +61 401 155 707 ♽ made with 100 percent post-consumer electrons _______________________________________________ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/