Hi Ben,
In response to your questions (and apologies if I'm teaching anyone on this
list to suck eggs):-
- What do you feel you might gain by placing 500Gb+ files into a repository,
compared with having them in an addressable filestore?
My understanding is that many funding bodies are now requiring that research
data be made available along with the academic paper allowing for people to
investigate/reproduce published research. The gain here in having the research
data in a repository like Dspace would I guess makes it possible for people to
easily find the data alongside the papers. That's not to say that the actual
data has to be in the traditional Dspace asset store. From what I've read so
far on Dspace, the data can also be "referenced", as long as the reference is
to a file store accessible to Dspace. Further responses to this thread have
talked about using other ideas like CKAN etc. - of which I know near zero so I
will need to investigate further. At Exeter are very much geared up for using
Dspace since we already have a number of Dspace repositories running already,
so whatever solution we end up with, I think currently it will involve Dspace.
The other gain of course is the proper and managed curation of research data
through a work flow process rather the data ending up on a DVD on a professors
shelf.
- Have people been able to download files of that size from DSpace, Fedora or
EPrints?
No idea!! But I take the point and it's one I've already alluded to - it's all
very well getting this stuff into or referenced from Dspace but how will Joe
researcher down load it easily. I suspect there is a requirement for providing
other mechanisms for download from or via the Dspace repository rather than a
normal web interface - isn't this where SWORD comes in ?
- Has the repository been allocated space on a suitable filesystem? XFS, EBS,
Thumper or similar?
Yes I think so. We have a EMC Atmos providing currently 860TB of raw storage.
This is basically object cloud storage in a similar vein to Amazon S3, but it
does also provide NFS/CIFS access via an what EMC call an IFS server. We are
currently running a DSpace asset store on our Atmos using an IFS server. Atmos
also has a REST based interface and also has as an Amazon S3 Proxy (i.e. making
it work with many Amazon S3 clients) in development and we have been beta
testing this. We are also hoping to use the Atmos for backup of live research
data. Atmos is good for archiving and serving up objects to the web and the
sort of things people use S3 for but it's not tier 1 storage - it's not
designed to be. The fit as a DSpace asset store seems to be a good one. Caveat
- still remains to be proved in production DSpace use.
- Once the file is ingested into DSpace or Fedora for example, is there any
other route to retrieve this, aside from HTTP? (Coding your own servlet/addon
is not a real answer to this.) Is it easily accessible via Grid-FTP or HPN-SSH
for example?
Not yet for us, but I agree that another route such as HPN-SSH will be needed
for large data sets.
So, in short, weigh up the benefits against the downsides and not in
hypotheticals. Actually do it, and get real researchers to try and use it.
You'll soon have a metric to show what is useful and what isn't.
That's what we are aiming to do as part of the JISC funded OpenExeter project -
we will be piloting with researchers and using this to develop procedures and
workflow etc.
Many thanks for all comments and emails so far on this thread - very useful and
interesting.
Best regards,
Pete
------------------------------------------------------------------------------
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of
discussion for anyone considering optimizing the pricing and packaging model
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
_______________________________________________
sword-app-tech mailing list
sword-app-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sword-app-tech