Hi Dave, the suggested workaround works. Thanks! Also thanks to Chris for taking the time to respond.
I have another question about performance: Using jena-text with a Lucene index is expected to be faster than a query with a regex filter, correct? I ran two queries, returning the (almost) same data, one using jena-text, the other regex filter. I measured the execution times from QueryFactory.create until after qe.execSelect(). And from there to after CSVOutput.out(rs) (queries are attached). The results I get are: jena-text: FINISH 1 - 359.32ms FINISH 2 - 130.28ms OVERALL - 489.61ms regex filter: FINISH 1 - 46.27ms FINISH 2 - 2540.39ms OVERALL - 2586.66ms So it seems to confirm the assumption that jena-text is faster. I was just wondering where the difference in FINISH 1 and FINISH 2 time is coming from? Is it executing the query or just preparing it in FINISH 1 and executing it once the ResultSet is being iterated over in FINISH 2? The FINISH 2 time kind of suggests that since both are printing out the same list, but regex takes much longer to "print" which seems unlikely if it was just printing the same list to console. And btw, sometimes the very first query using jena-text takes much longer than subsequent queries. I am assuming that it is doing some sort of caching in the first one?! -Wolfgang Original Message----- From: Dave Reynolds <[email protected]> To: users <[email protected]> Sent: Wed, Oct 23, 2013 6:26 pm Subject: Re: Jena Text Lucene Assembler file questions On 23/10/13 17:04, [email protected] wrote: > I just compared the file I sent with the local one that I am using and they are identical. I also just ran rdfcat and it reads the file just fine. Tried jena.textindexer again and that still fails. You are using relative URIs and those are what the parser seems to be complaining about. For some reason, judging from your error messages, the base URI is being taken as the windows directory path instead of being a legal (file:) URI. I don't know why that's happening but a suggested work round would be to avoid relative URIs - replace your uses of <#dataset>, <#indexLucene> and <#entMap> with :dataset, :indexLucene and :entMap. Dave > -----Original Message----- > From: Chris Dollin <[email protected]> > To: users <[email protected]> > Sent: Wed, Oct 23, 2013 3:32 pm > Subject: Re: Re: Jena Text Lucene Assembler file questions > > > On Wednesday, October 23, 2013 07:53:26 AM [email protected] wrote: >> The file is called "jena_assembler.ttl" on my machine. I had to rename it to > .txt so the mailing list attachment filter wouldn't remove it. I am getting the > error that I attached earlier when executing this from the command line: >> >> java -cp %FUSEKI_HOME%\fuseki-server.jar jena.textindexer --desc=C:\Development\Ontology\jena_assembler.ttl > > I can read jena_assembler.ttl without problems (using jena.rdfcat). > Are you sure that's the right file? > > Chris >
PREFIX text: <http://jena.apache.org/text#> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?s nci:Preferred_Name ?prefName . FILTER ( regex(?prefName, "Head", "" )) } ORDER BY ?prefName
PREFIX text: <http://jena.apache.org/text#> PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?s text:query (nci:Preferred_Name 'Head') . ?s nci:Preferred_Name ?prefName . } ORDER BY ?prefName
