Dear DSpace Developers,
due to an obligation by our data protection officer we have to delete EPersons
(DSpace user accounts) when they leave our university. When an EPerson should
be deleted, DSpace checks whether her or his ID is referenced by any other
table and refuses the deletion if there are any references. For our repository
I have to change this behavior. I can imagine that other repositories (at least
here in Germany) have or will have the same problem, so I would be glad to
develop a solution that could be merged into a future release version of
DSpace. As this is a major change I would like to discuss it before I start to
write or change any code.
The class EPerson contains a method getDeleteConstraints() that returns a List
of String containing the table names of the tables to the eperson_id. As far
as I noticed there are mainly three areas with references to an EPerson's ID:
submitted items, workflow items and workflow tasks. I was surprised that the
table versionitem is not queried although it refers the eperson_id as the other
tables do. There are some more tables (ResourcePolicy, EPersonGroup2EPerson and
Subscription) that does not get checked, but I think it's wanted behavior to
also delete the entries in those tables together with the EPerson. So what is
the reason that an EPerson referenced in Item, WorkflowItem, versionitem (I
think it's was just forgotten to add this to the list of checked tables) and
the workflow tables cannot be deleted? Is there a way to overcome this
limitation?
Items
I think the submitter_id in an Item has two reasons: If an EPersons logs in he
or she can see his or hers submitted Items. If the account of the EPerson is
deleted, he/she cannot login anymore, so this shouldn't be a problem. The
second reason I can imagine is that it might be necessary to be able to lookup
who submitted an Item in case of any legal issues. But this information is
stored in the provenance metadata as well (see
https://github.com/DSpace/DSpace/blob/master/dspace-api/src/main/java/org/dspace/workflow/WorkflowManager.java#L1140
and line 160 where this method gets called). I would set the submitter_id to
null when the EPerson gets deleted.
WorkflowItems
If a user account shall be deleted, are there any reasons to keep unsubmitted
Items? Why should we store workflow items of a user that cannot login anymore?
I would suggest to tell the admin, that there are unsubmitted workflow items
and to ask whether the account should really be deleted with all unsubmitted
workflow items of the user.
Workflow Tasks
I did not try to understand the complete code of workflow tasks, so please
don't hesitate to tell me if I'm running behind a false idea. There are two
implementations of workflow task, the "original" and the configurable xml
workflow task. Nevertheless, I would expect that all tasks would be put back in
the task pool if an EPerson gets deleted.
Do I miss anything? Do you have any idea how to deal with the situations that
arise when an EPerson is deleted? Should it be configurable whether it is
possible to delete an EPerson referenced by the tables named above? Would such
a code contribution have the chance to come into DSpace? Is anyone interested
in helping me?
I would appreciate any responses.
Regards,
Pascal
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel