Below is a bit more complete example of my problem. What happens when I
run this is that the TDB model is created in the specified directory but
index directory contains only a couple of segements files, but not a
complete index. (If I run the example supplied with the jena-text
release I can get an index created). So I'm guessing that the way I am
creating an index from an existing TDB source is not persisting the
index for some reason.
String modelUrl ="file:///E:/skos/AAA.xml";
String modelDirectory ="E:/tdb/AAA";
File indexPath =*new*File("E:/tdbindex/AAA");
Directory directory = FSDirectory./open/(indexPath);
//
Dataset modelDataset =*null*;
Dataset indexedDataset =*null*;
*try*{
modelDataset = TDBFactory./createDataset/(modelDirectory);
//
Model modelBase = modelDataset.getDefaultModel();
modelBase.read(modelUrl);
//
Model defaultModel = modelDataset.getDefaultModel();
StmtIterator si = defaultModel.listStatements();
System./out/.println("Number of model statements: "+si.toList().size());
//
EntityDefinition entDef
=*new*EntityDefinition(/PREF_LABEL_PROPERTY/,"prefLabel",
RDFS./label/.asNode()) ;
//
indexedDataset = TextDatasetFactory./createLucene/(modelDataset,
directory, entDef);
defaultModel = indexedDataset.getDefaultModel();
si = defaultModel.listStatements();
System./out/.println("Number of model statements: "+si.toList().size());
}
*catch*(Exception e) {
*throw*e;
}
*finally*{
*if*(modelDataset !=*null*) { modelDataset.close(); }
*try*{*if*(indexedDataset !=*null*) { indexedDataset.close(); }
}*catch*(Exception e) {}
}
Thanks for any suggestions.
On 11/26/2013 3:23 AM, Andy Seaborne wrote:
Carlos,
Do you have a complete, minimal example? Your description looks OK
but the details matter. What is the code to setup the index?
Andy
On 26/11/13 00:47, Carlos S. Zamudio wrote:
Hi,
I'm having a bit of trouble deciphering the specification of the
EntityDefinition when constructing a Jena TDB index using the jena-text
module in 2.11.0. (I've been successfully using the previous LARQ module
for indexing RDF data sets).
I am attempting to index a data set that represents a SKOS vocabulary.
Below is an example entry in the model:
|<http://purl.obolibrary.org/obo/ID_62354>|||
| skos:broader <http://purl.obolibrary.org/id/ID_35317> ;|
| skos:prefLabel "The preferred label for the entity" ;|
| skos:hiddenLabel "The hidden label for the entity" ;|
| skos:altLabel "An alternative label for the entity" ;|
| rdf:type skos:Concept|
The skos:prefLabel, skos:hiddenLabel and skos:altLabel are subclasses of
rdfs:label.
I would like to index the prefLabel, hiddenLabel and altLabels for all
of the entries.
The EntityDefintion is defined in the documentation as follows:
|public EntityDefinition(String entityField,|||
| String primaryField,|
| com.hp.hpl.jena.rdf.model.Resource primaryPredicate)|
From what I can gather the entityField is the field name in the index.
The primary field should be the skos:prefLabel property for example. And
the primaryPredicate should be specified as the RDFS.label.asNode()
resource.
It seems I can also add additional fields by calling the .set() method.
I can't seem to generate an index file when I use:
TextDatasetFactory.createLucene(dataset, directory, entityDefition)
I've verified that my dataset is valid, and that the directory is also
valid.
Do I have the right idea for specifying an EntityDefinition?
Any hints would be appreciated.