Hi Dave, I am using an older version of the NCI thesaurus (11.12e) where the URIs still contain meaningful names. It is still available on the NCIt download page.
I am sure the URIs I am using are correct, since it works in my sparql queries and the listProperies() call. I am going to give your suggestion a shot tomorrow. Another piece of info: I recently switched to the latest Jena version and am now using the windows batch files for TDB import instead of the CYGWIN/bash based versions that I used before. Unfortunately, I have not really used the JENA API before. I was strictly using Sparql queries so I cannot really tell if that might be the culprit. Btw, I am really happy that the windows batch files were added. Thanks! Wolfgang -----Original Message----- From: Dave Reynolds <[email protected]> To: users <[email protected]> Sent: Thu, Feb 21, 2013 6:26 pm Subject: Re: OntModel.getOntClass does not return existing classes On 21/02/13 16:46, [email protected] wrote: > > Hi, > > I imported the NCI Thesaurus into a TDB store. I am using the Jena API to get references to existing classes. This is my basic setup: > > Dataset ds = > TDBFactory.createDataset("C:\\Playground\\Ontology\\TDBStore_Instances"); > OntModel modelOnt = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM, ds.getDefaultModel()); > > For some classes the getOntClass() method works fine, e.g.: > > modelOnt.getOntClass(NS_NCI_HASH + "Neoplasm"); > modelOnt.getOntClass(NS_NCI_HASH + "Volume"); > > But for other classes, getOntClass() returns null, e.g.: > > modelOnt.getOntClass(NS_NCI_HASH + "Carcinoma"); > modelOnt.getOntClass(NS_NCI_HASH + "Malignant_Prostate_Neoplasm"); > > > I tried to get an OntClass reference for these in different ways, too, e.g.: > I used modelOnt.listStatements(...) and stmt.getSubject().as(OntClass.class). But this throws the exception: > > Cannot convert node > http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Malignant_Prostate_Neoplasm > to OntClass: it does not have rdf:type owl:Class or equivalent > > The problem is, that > http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Malignant_Prostate_Neoplasm > is an owl:Class. I verified this by just printing stmt.getSubject().listProperties() to the console: > > [http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#Malignant_Prostate_Neoplasm, > http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://www.w3.org/2002/07/owl#Class] > > I also used a Sparql query on the same data set: > > String query = PREFIXES > + " SELECT * " > + " WHERE { " > + " nci:Malignant_Prostate_Neoplasm ?p ?o . " > + " }"; > > Which prints this: > p | o > rdf:type | owl:Class > nci:Preferred_Name | "Malignant Prostate Neoplasm"^^xsd:string > ... > > So based on what I can see (and know), Carcinoma and > Malignant_Prostate_Neoplasm are both owl:Class, but getOntClass() does not seem to agree. > > Does anybody know why? No :) If your subject resource really does have an rdf:type owl:Class assertion then that's enough to allow the as(OntClass.class) to go through. If I download that ontology I don't see URIs like that, they are all of the form #Cxxxxxx so it's hard to check. Presumably you have do some sort of transformation on the data. If you really are using identical URIs in the getOntClass and the listStatments call then I can't see how that could happen. Short of something drastic like a corrupt TDB database but you would know about that. You do have the workaround to setStrictMode(false). Perhaps if you do that and then examine the OntClass you get back some explanation might be revealed. Dave
