Re: Jena Text Lucene Assembler file questions

hueyl16 Thu, 24 Oct 2013 02:32:59 -0700

Hi Dave,

the suggested workaround works. Thanks! Also thanks to Chris for taking the 
time to respond.

I have another question about performance: Using jena-text with a Lucene index 
is expected to be faster than a query with a regex filter, correct? I ran two 
queries, returning the (almost) same data, one using jena-text, the other regex 
filter. I measured the execution times from QueryFactory.create until after 
qe.execSelect(). And from there to after CSVOutput.out(rs) (queries are 
attached). The results I get are:

jena-text:
FINISH 1 - 359.32ms
FINISH 2 - 130.28ms
OVERALL - 489.61ms

regex filter:
FINISH 1 - 46.27ms
FINISH 2 - 2540.39ms
OVERALL - 2586.66ms

So it seems to confirm the assumption that jena-text is faster. I was just 
wondering where the difference in FINISH 1 and FINISH 2 time is coming from? Is 
it executing the query or just preparing it in FINISH 1 and executing it once 
the ResultSet is being iterated over in FINISH 2? The FINISH 2 time kind of 
suggests that since both are printing out the same list, but regex takes much 
longer to "print" which seems unlikely if it was just printing the same list to 
console.

And btw, sometimes the very first query using jena-text takes much longer than 
subsequent queries. I am assuming that it is doing some sort of caching in the 
first one?!

-Wolfgang

Original Message-----

From: Dave Reynolds <[email protected]>
To: users <[email protected]>
Sent: Wed, Oct 23, 2013 6:26 pm
Subject: Re: Jena Text Lucene Assembler file questions

On 23/10/13 17:04, [email protected] wrote:
> I just compared the file I sent with the local one that I am using and they 
are identical. I also just ran rdfcat and it reads the file just fine. Tried 
jena.textindexer again and that still fails.

You are using relative URIs and those are what the parser seems to be 
complaining about. For some reason, judging from your error messages, 
the base URI is being taken as the windows directory path instead of 
being a legal (file:) URI.

I don't know why that's happening but a suggested work round would be to 
avoid relative URIs - replace your uses of <#dataset>, <#indexLucene> 
and <#entMap> with :dataset, :indexLucene and :entMap.

Dave

> -----Original Message-----
> From: Chris Dollin <[email protected]>
> To: users <[email protected]>
> Sent: Wed, Oct 23, 2013 3:32 pm
> Subject: Re: Re: Jena Text Lucene Assembler file questions
>
>
> On Wednesday, October 23, 2013 07:53:26 AM [email protected] wrote:
>> The file is called "jena_assembler.ttl" on my machine. I had to rename it to
> .txt so the mailing list attachment filter wouldn't remove it. I am getting 
the
> error that I attached earlier when executing this from the command line:
>>
>> java -cp %FUSEKI_HOME%\fuseki-server.jar jena.textindexer 
--desc=C:\Development\Ontology\jena_assembler.ttl
>
> I can read jena_assembler.ttl without problems (using jena.rdfcat).
> Are you sure that's the right file?
>
> Chris
>

PREFIX text: <http://jena.apache.org/text#> 
PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * 
WHERE { 
?s nci:Preferred_Name ?prefName . 
FILTER ( regex(?prefName, "Head", "" ))  
}
ORDER BY ?prefName

PREFIX text: <http://jena.apache.org/text#> 
PREFIX nci: <http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * 
WHERE { 
?s text:query (nci:Preferred_Name 'Head') . 
?s nci:Preferred_Name ?prefName . 
}
ORDER BY ?prefName

Re: Jena Text Lucene Assembler file questions

Reply via email to