On Fri, May 30, 2014 at 10:16:25AM -0400, Peter Dietz wrote: [snip] > Not to derail onto a tangent, but one thing I'd like to see DSpace support > is some type of background-processing-queue.
I would distinguish different types of background activities. > i.e. new content submitted should be queued to get: initial checksum, virus > check, media-filters to generate thumbnail and fulltext extraction, > Discovery needs to index the content Good idea. These are relatively short-running, as-needed tasks whose demand is unrelated to the size of the repository. They should happen at submission but it may be annoying to wait for them before submission completes. They can all be done using Curation Tasks. (I think we ought to do a lot more by attaching tasks to workflow steps rather than hardcoding stuff or pushing it off to later batch processing.) > And then there are maintenance jobs: Recompute the checksum, OAI harvest, > index-maintenance, ... Cron and Task Scheduler are your friends. These grovel over the entire repository, perhaps for hours, and can demand a lot of storage bandwidth. Wearing my sysadmin. hat, I would very much prefer to schedule these myself using the same facilities that I use for other long-running resource-hungry periodic maintenance tasks, so that I can get a complete picture of the expected load by looking in one place. > New submissions add to the queue, some scheduler can add maintenance tasks > to the queue. This way you don't run into the issue of 3+ concurrent cron > jobs because they didn't complete in time. Maybe you can even tie this in > to the curation task queue system too. In the past we had a GitHub > Enterprise/Firewall, and being an admin of that shows you fancy admin bells > and whistles, where you can even inspect the queue. > > Now what happens if queue growth exceeds its throughput, we'll cross that > bridge when we get there. -- Mark H. Wood, Lead System Programmer mw...@iupui.edu Machines should not be friendly. Machines should be obedient.
signature.asc
Description: Digital signature
------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech
_______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette