Re-import of RDF files to the generic RDF indexer must replace existing data in
the Jena TDB store
--------------------------------------------------------------------------------------------------
Key: STANBOL-554
URL: https://issues.apache.org/jira/browse/STANBOL-554
Project: Stanbol
Issue Type: Improvement
Components: Entity Hub
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Priority: Minor
Problem:
---
The "indexing/resource/tdb" folder contains the Jena TDB triplestore
with the imported RDF data. This data are kept in-between indexing
processes mainly because the time needed to import the RDF data is
typically approximately the same as needed for the indexing process.
Because of that it makes a lot of sense to reuse already imported RDF
data if you index RDF dumps (e.g. DBpedia).
In the case where the RDF data change this default is not optimal,
because the changed dataset is appended to data already present in the
Jena TDB store. This means that if you change or remove things in your
thesaurus they will still be present within the triple store and
therefore also appear in the created index.
Workaround:
---
Users need to manually delete the
{indexing-root}indexing/resource/tdb
this will cuase that a new - empty - Jena TDB store is created on the next run
Solution
---
Variant 1:
If named graphs are used to add RDF data to the Jena TDB store it would be
possible to delete all data of the previous version of an RDF file before
re-importing it. This would keep the advantages of preventing the re-import of
all data while solving this issue. In addition it would not require any
additional configuration.
Variant 2:
Add an property to the indexing.properties that specifies if existing data
within the Jena TDB store should be kept or deleted on every indexing run.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira