On Fri, May 30, 2014 at 10:16:25AM -0400, Peter Dietz wrote:
[snip]
> Not to derail onto a tangent, but one thing I'd like to see DSpace support
> is some type of background-processing-queue.

I would distinguish different types of background activities.

> i.e. new content submitted should be queued to get: initial checksum, virus
> check, media-filters to generate thumbnail and fulltext extraction,
> Discovery needs to index the content

Good idea.  These are relatively short-running, as-needed tasks whose
demand is unrelated to the size of the repository.  They should happen
at submission but it may be annoying to wait for them before
submission completes.  They can all be done using Curation Tasks.  (I
think we ought to do a lot more by attaching tasks to workflow steps
rather than hardcoding stuff or pushing it off to later batch
processing.)

> And then there are maintenance jobs: Recompute the checksum, OAI harvest,
> index-maintenance, ...

Cron and Task Scheduler are your friends.  These grovel over the
entire repository, perhaps for hours, and can demand a lot of storage
bandwidth.  Wearing my sysadmin. hat, I would very much prefer to
schedule these myself using the same facilities that I use for other
long-running resource-hungry periodic maintenance tasks, so that I can
get a complete picture of the expected load by looking in one place.

> New submissions add to the queue, some scheduler can add maintenance tasks
> to the queue. This way you don't run into the issue of 3+ concurrent cron
> jobs because they didn't complete in time. Maybe you can even tie this in
> to the curation task queue system too. In the past we had a GitHub
> Enterprise/Firewall, and being an admin of that shows you fancy admin bells
> and whistles, where you can even inspect the queue.
> 
> Now what happens if queue growth exceeds its throughput, we'll cross that
> bridge when we get there.

-- 
Mark H. Wood, Lead System Programmer   mw...@iupui.edu
Machines should not be friendly.  Machines should be obedient.

Attachment: signature.asc
Description: Digital signature

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to