Hi Stuart,

You asked for more info. We are developing a Research Data Repository
based on Dspace for storing the research data associated with Exeter
University research publications.
For some research fields such as Physics, Biology, this data can be very
large - TB's it seems!, hence the need to consider large injests over what
might be several days.
The researcher has the data, and would I am guessing create the metadata
but maybe in collaboration with a data curator. Ideally the researcher
would perform the deposit with, for large data sets, an offline injest of
the data itself. The data can be on the researchers
server/workstation/laptop/dvd/usb hard drive etc.

There seems to be a couple of ways at least of approaching this so what I
was after was some references to what and how other people have done this
to give me a better handle on the best way forward - having very little
dspace or repository experience myself. But given the size of larger data
sets, I do think the best solution will involve as little copying of the
data as possible - with the ultimate being just one copy process, of the
data from source into repository. Everything less being done by reference
if that is possible.

Are you perhaps able to point me at some "code" examples for the SWORD
deposit you talk about where a second process injests the files ? Would
this be coded in Java ?
Does the injest process have to be java based or can it be a perl script
for example ? Please forgive my Dspace ignorance!
  
Best regards,

Pete


On 28/11/2011 20:26, "Stuart Lewis" <s.le...@auckland.ac.nz> wrote:

>Hi Pete,
>
>'Deposit by reference' would probably be used to 'pull' data from a
>remote server.  If you already have the data on your DSpace server, as
>Mark points out there might be better ways to perform the import, such as
>registering the bitstreams, or just performing a local import.
>
>A SWORD deposit by reference might take place in two parts:
>
> - Deposit some metadata, that includes a description of the file(s) to
>be ingested
>
> - A second process (perhaps triggered by the SWORD deposit, or
>undertaken later, such as via a DSpace curation task) that ingests the
>file(s) into the DSpace object.
>
>Could you tell us a bit more about the process you want to implement?
>Who has the data, the metadata, who performs the deposit etc?
>
>Thanks,
>
>
>Stuart Lewis
>Digital Development Manager
>Te Tumu Herenga The University of Auckland Library
>Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
>Ph: +64 (0)9 373 7599 x81928
>
>
>
>On 29/11/2011, at 7:19 AM, Leggett, Pete wrote:
>
>> Stuart,
>>  
>> Can you provide any links to examples of using Œdeposit by reference¹ ?
>>  
>> I am looking at feasibility of depositing very large items (tar.gz or
>>zip¹d data files), say up to 16TB, into Dspace 1.6.x with the obvious
>>problems of doing this using a web interface.
>> Wondering if EasyDeposit can be adapted to do Œdeposit by reference¹
>>with either a utility of some kind on the  dspace server looking for
>>large items to injest or a client pushing the data onto a directory on
>>the dspace server from where it can be injested. Ideally want to
>>minimise any copies of the data.
>>  
>> Really want to avoid copying the item once it¹s on the Dspace server.
>>Could item be uploaded directly into asset store maybe ?
>> The other problem is how anyone could download the item once it¹s in
>>Dspace ?
>>  
>> Anyone else doing this sort of very large item ( i.e. TB¹s ) injest ?
>>  
>> Thank you,
>> 
>> Pete
>>  
>>  
>> From: David FLANDERS [mailto:d.fland...@jisc.ac.uk]
>> Sent: 21 November 2011 18:10
>> To: Ben O'Steen; Stuart Lewis
>> Cc: &lt, sword-app-tech@lists.sourceforge.net&gt,
>> Subject: Re: [sword-app-tech] How to send large fiels
>>  
>> To second that, some amazing things being done down here in Australia
>>as they take *large* data off of scientific instruments. /dff
>>  
>> From: Ben O'Steen [mailto:bost...@gmail.com]
>> Sent: 14 November 2011 23:54
>> To: Stuart Lewis
>> Cc: &lt, sword-app-tech@lists.sourceforge.net&gt,
>> Subject: Re: [sword-app-tech] How to send large fiels
>>  
>> +1 for deposit by reference. It is almost like giving a metadata
>>receipt for a deposit not happening via the http route.
>> 
>> I would also highly recommend looking at High-Performance SSH or
>>HPN-SSH. On comparable hardware, I have been shown that it outpaces even
>>grid-ftp for file transfer speeds, but is a backward compatible patch
>>for the openssh library.
>> 
>> This means that if the server and client are both patched, the transfer
>>is multithreaded and otherwise highly optimized. It means that Unix
>>tools which use SSH benefit as well - rsync, ssh -X, and so on.
>> 
>> Ben
>> 
>> On Nov 15, 2011 10:37 AM, "Stuart Lewis" <s.le...@auckland.ac.nz> wrote:
>> Hi Jesús,
>> 
>> The method that seems to work in this setting is to use 'deposit by
>>reference'.  That is, you deposit a description of the item, including
>>details of where the item can be found.  It is then up to ingesting
>>system to pull the data - perhaps via an offline queue process, using
>>some other method (ftp, scp, nfs, etc).
>> 
>> SWORD v2 might be useful here too, because the SWORD statement could be
>>requested to find out the status of the file upload (for example, using
>>a status such as pending, in-process, complete, failed, etc).  This
>>would allow the sender/depositor to be able to find out the status of
>>the item.
>> 
>> Let us know how you get on - it is interesting to see the protocol
>>being pushed to its limits with use cases such as yours.
>> 
>> Thanks,
>> 
>> 
>> Stuart Lewis
>> Digital Development Manager
>> Te Tumu Herenga The University of Auckland Library
>> Auckland Mail Centre, Private Bag 92019, Auckland 1142, New Zealand
>> Ph: +64 (0)9 373 7599 x81928
>> 
>> 
>> 
>> On 15/11/2011, at 11:27 AM, Jesús García Crespo wrote:
>> 
>> > Hi everyone,
>> >
>> > We are using SWORD to deposit DIPs in ICA-AtoM from Archivematica,
>>but we have encountered some problems with large files (8GB). HTTP
>>requests can be very resource intensive and unmanageable when the
>>contents are very big. Does anyone have any recommendations, for
>>depositing large files via the SWORD protocol? Like maybe sending
>>related files using SFTP and then indicating the local file route? (That
>>is something we are considering.)
>> >
>> > Thank you in advance,
>> >
>> > --
>> > Jesús García Crespo
>> > 
>>-------------------------------------------------------------------------
>>-----
>> > RSA(R) Conference 2012
>> > Save $700 by Nov 18
>> > Register now
>> > 
>>http://p.sf.net/sfu/rsa-sfdev2dev1_______________________________________
>>________
>> > sword-app-tech mailing list
>> > sword-app-tech@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/sword-app-tech
>> 
>> 
>> 
>> 
>> 
>>-------------------------------------------------------------------------
>>-----
>> RSA(R) Conference 2012
>> Save $700 by Nov 18
>> Register now
>> http://p.sf.net/sfu/rsa-sfdev2dev1
>> _______________________________________________
>> sword-app-tech mailing list
>> sword-app-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/sword-app-tech
>
>


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
sword-app-tech mailing list
sword-app-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sword-app-tech

Reply via email to