Hi Michel
On 26.03.2012, at 16:40, Michel Benevento wrote:
> Hello,
>
> As I am experimenting with various versions of my importfile I have changed
> my namespace urls. But when I refresh the index, the old namespaces keep
> accumulating in my results, resulting in duplicates. Is this intended
> behavior? How can I get rid of these (cached?) results and return to a
> pristine state?
>
I think I have an explanation for what you are seeing. Can you please check
that.
The indexing tool does NOT delete the "{indexing-root}/indexing/destination"
folder. So if you index your data twice without deleting this folder the new
data will be appended. This would explain why you still see the data with the
old namespaces. So please try to delete the indexing/destination folder and
index again.
This behavior is not a bug, but a feature because is allows to index multiple
datasets. I am currently writing some documentation on that so I will copy the
section related to the end of this mail.
best
Rupert
- - -
### Indexing Datasets separately
This demo indexes all four datasets in a single step. However this is not
required. With a simple trick it is possible to index different datasets with
different indexing configurations to the same target. This section describes
how this could be achieved and why users might want to do this.
This demo uses Solr as target for the indexing process. Theoretically there
might be several possibility, but currently this is the only available
IndexingDestination implementation. The SolrIdnex used to store the data is
located at "{indexing-root}/indexing/destination/indexes/default/{name}. If
this directory does not alread exist it is initialized by the indexing tool
based on the SolrCore configuration in "{indexing-root}/indexing/config/{name}"
or the default SolrCore configuration of not present. However if it already
exists than this core is used and the data of the current indexing process are
added to the existing SolrCore.
Because of that is is possible to subsequently add information of different
datasets to the same SolrIndex. However users need to know that if the
different dataset contain the same entity (resource with the same URI) the
information of the second dataset will replace those of the first. Nonetheless
this would allow in the given demo to create separate configurations (e.g.
mappings) for all four datasets while still ensuring the indexed data are
contained in the same SolrIndex.
This might be useful in situations where the same property (e.g. rdfs:label) is
used by the different datasets in different ways. Because than one could create
a mapping for dataset1 that maps rdfs:label > skos:prefLabel and for dataset2
an mapping that ensures that rdfs:label > skos:altLabel.
Workflows like that can be easily implemented by shell scrips or by setting
soft links in the file system.