Dear DSpace Developers,

due to an obligation by our data protection officer we have to delete EPersons 
(DSpace user accounts) when they leave our university. When an EPerson should 
be deleted, DSpace checks whether her or his ID is referenced by any other 
table and refuses the deletion if there are any references. For our repository 
I have to change this behavior. I can imagine that other repositories (at least 
here in Germany) have or will have the same problem, so I would be glad to 
develop a solution that could be merged into a future release version of 
DSpace. As this is a major change I would like to discuss it before I start to 
write or change any code.

The class EPerson contains a method getDeleteConstraints() that returns a List 
of String containing the table names of the tables  to the eperson_id. As far 
as I noticed there are mainly three areas with references to an EPerson's ID: 
submitted items, workflow items and workflow tasks. I was surprised that the 
table versionitem is not queried although it refers the eperson_id as the other 
tables do. There are some more tables (ResourcePolicy, EPersonGroup2EPerson and 
Subscription) that does not get checked, but I think it's wanted behavior to 
also delete the entries in those tables together with the EPerson. So what is 
the reason that an EPerson referenced in Item, WorkflowItem, versionitem (I 
think it's was just forgotten to add this to the list of checked tables) and 
the workflow tables cannot be deleted? Is there a way to overcome this 
limitation?

Items
I think the submitter_id in an Item has two reasons: If an EPersons logs in he 
or she can see his or hers submitted Items. If the account of the EPerson is 
deleted, he/she cannot login anymore, so this shouldn't be a problem. The 
second reason I can imagine is that it might be necessary to be able to lookup 
who submitted an Item in case of any legal issues. But this information is 
stored in the provenance metadata as well (see 
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/workflow/WorkflowManager.java#L1140
 and line 160 where this method gets called). I would set the submitter_id to 
null when the EPerson gets deleted.

WorkflowItems
If a user account shall be deleted, are there any reasons to keep unsubmitted 
Items? Why should we store workflow items of a user that cannot login anymore? 
I would suggest to tell the admin, that there are unsubmitted workflow items 
and to ask whether the account should really be deleted with all unsubmitted 
workflow items of the user.

Workflow Tasks
I did not try to understand the complete code of workflow tasks, so please 
don't hesitate to tell me if I'm running behind a false idea. There are two 
implementations of workflow task, the "original" and the configurable xml 
workflow task. Nevertheless, I would expect that all tasks would be put back in 
the task pool if an EPerson gets deleted.

Do I miss anything? Do you have any idea how to deal with the situations that 
arise when an EPerson is deleted? Should it be configurable whether it is 
possible to delete an EPerson referenced by the tables named above? Would such 
a code contribution have the chance to come into DSpace? Is anyone interested 
in helping me?

I would appreciate any responses.
Regards,
  Pascal
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to