Re-import of RDF files to the generic RDF indexer must replace existing data in 
the Jena TDB store
--------------------------------------------------------------------------------------------------

                 Key: STANBOL-554
                 URL: https://issues.apache.org/jira/browse/STANBOL-554
             Project: Stanbol
          Issue Type: Improvement
          Components: Entity Hub
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
            Priority: Minor


Problem:
---


The "indexing/resource/tdb" folder contains the Jena TDB triplestore
with the imported RDF data. This data are kept in-between indexing
processes mainly because the time needed to import the RDF data is
typically approximately the same as needed for the indexing process.
Because of that it makes a lot of sense to reuse already imported RDF
data if you index RDF dumps (e.g. DBpedia).

In the case where the RDF data change this default is not optimal,
because the changed dataset is appended to data already present in the
Jena TDB store. This means that if you change or remove things in your
thesaurus they will still be present within the triple store and
therefore also appear in the created index.

Workaround:
---

Users need to manually delete the

    {indexing-root}indexing/resource/tdb

this will cuase that a new - empty - Jena TDB store is created on the next run

Solution
---

Variant 1: 

If named graphs are used to add RDF data to the Jena TDB store it would be 
possible to delete all data of the previous version of an RDF file before 
re-importing it. This would keep the advantages of preventing the re-import of 
all data while solving this issue. In addition it would not require any 
additional configuration.

Variant 2:

Add an property to the indexing.properties that specifies if existing data 
within the Jena TDB store should be kept or deleted on every indexing run.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to