Re: Namespaces accumulate on refresh

Rupert Westenthaler Mon, 26 Mar 2012 08:12:09 -0700

Hi Michel
On 26.03.2012, at 16:40, Michel Benevento wrote:

> Hello,
> 
> As I am experimenting with various versions of my importfile I have changed 
> my namespace urls. But when I refresh the index, the old namespaces keep 
> accumulating in my results, resulting in duplicates. Is this intended 
> behavior? How can I get rid of these (cached?) results and return to a 
> pristine state?
>


I think I have an explanation for what you are seeing. Can you please check 
that.

The indexing tool does NOT delete the "{indexing-root}/indexing/destination" 
folder. So if you index your data twice without deleting this folder the new 
data will be appended. This would explain why you still see the data with the 
old namespaces. So please try to delete the indexing/destination folder and 
index again.

This behavior is not a bug, but a feature because is allows to index multiple 
datasets. I am currently writing some documentation on that so I will copy the 
section related to the end of this mail.

best
Rupert

- - -
### Indexing Datasets separately

This demo indexes all four datasets in a single step. However this is not 
required. With a simple trick it is possible to index different datasets with 
different indexing configurations to the same target. This section describes 
how this could be achieved and why users might want to do this.

This demo uses Solr as target for the indexing process. Theoretically there 
might be several possibility, but currently this is the only available 
IndexingDestination implementation. The SolrIdnex used to store the data is 
located at "{indexing-root}/indexing/destination/indexes/default/{name}. If 
this directory does not alread exist it is initialized by the indexing tool 
based on the SolrCore configuration in "{indexing-root}/indexing/config/{name}" 
or the default SolrCore configuration of not present. However if it already 
exists than this core is used and the data of the current indexing process are 
added to the existing SolrCore.

Because of that is is possible to subsequently add information of different 
datasets to the same SolrIndex. However users need to know that if the 
different dataset contain the same entity (resource with the same URI) the 
information of the second dataset will replace those of the first. Nonetheless 
this would allow in the given demo to create separate configurations (e.g. 
mappings) for all four datasets while still ensuring the indexed data are 
contained in the same SolrIndex.

This might be useful in situations where the same property (e.g. rdfs:label) is 
used by the different datasets in different ways. Because than one could create 
a mapping for dataset1 that maps rdfs:label > skos:prefLabel and for dataset2 
an mapping that ensures that rdfs:label > skos:altLabel.

Workflows like that can be easily implemented by shell scrips or by setting 
soft links in the file system.

Re: Namespaces accumulate on refresh

Reply via email to