Thank you Andy for replying.
1. I have a mix of constrained and free text queries. My constrained queries
(or without free text/normal sparql queries) took 3-10 seconds. Free text
queries took around 1 second.
Do you mean that volume of Lucene index will affect constrained queries as
well?
At this point I had just included few concepts for Lucene index. Here is my
configuration:
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map ( [ text:field "text" ; text:predicate no:concept1 ]
[ text:field "text" ; text:predicate no:concept2 ]
[ text:field "text" ; text:predicate no:concept3 ]
[ text:field "text" ; text:predicate no:concept4 ]
[ text:field "text" ; text:predicate no:concept5 ]
[ text:field "text" ; text:predicate no:concept6 ] ) .
2. Here is a sample query which takes 10+ seconds to execute. Is there anything
wrong with this query (or possibility of optimization)?
PREFIX ex: <http://example.com/ns/concepts#>
PREFIX d: <http://example.com/ns/data#>
SELECT DISTINCT ?a1
WHERE {
?n1 a ex:concept1 ;
ex:concept2 ?c1 ;
ex:concept3 ?n2 ;
ex:concept4 ?f1 ;
ex:concept5 ?a1 .
?c1 ex:concept6 ?cn1 .
?f1 ex:concept7 ?fn1 .
FILTER (regex(?n2, "^word1", "i"))
FILTER (regex(?cn1, "^word2$", "i"))
FILTER (regex(?fn1, "^word3$", "i")) }
3. About Hardware, right now I am just running this on my MacBook Pro with 2.5
GHz Intel Core i7 and 16 GB of RAM.
It would be great if you could give me some suggestions or point me to any
resource that explains Fuseki optimization.
-Ajay
On Nov 11, 2015, at 4:27 AM, Andy Seaborne
<[email protected]<mailto:[email protected]>> wrote:
I was trying to evaluate Jena+Fuseki for a project. The number of
triples that I put in Fuseki is 3161033. Our queries are of search
type, for example, given a search term/phrase get count of results,
first 20 results and some facets. All queries took between 3-10
seconds to execute, which was disappointing.
3 million triples. That's not very many. It will depend on how much is
indexed into Lucene and what the query actually is but elsewhere I've seen much
larger datasets with text query running much faster.
There are lots of possible systems factors such as hardware, server or client
restarts (this java!) and how you ask the server query.
Andy
On 10/11/15 14:51, Kamble, Ajay, Crest wrote:
Hello,
1. Setup for Free Text Search
In assembler file I had to put two entries, 1 for TDB dataset and 1 for Lucene
indexed. After this change I was able to do free text queries for my TDB
dataset. However, I am not sure if this is the correct way.
<#service> rdf:type fuseki:Service ;
fuseki:name “mydb” ;# http://host:port/tdb
fuseki:serviceQuery "query" ; # SPARQL query service
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceUpdate "update" ; # SPARQL query service
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read
and write)
fuseki:dataset <#dataset> ;
#fuseki:dataset :text_dataset ;
.
<#service_text_tdb> rdf:type fuseki:Service ;
fuseki:name "fts" ; # http://host:port/tdb
fuseki:serviceQuery "query" ; # SPARQL query service
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceUpdate "update" ; # SPARQL query service
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read
and write)
#fuseki:dataset <#dataset> ;
fuseki:dataset :text_dataset ;
.
2. Performance
I was trying to evaluate Jena+Fuseki for a project. The number of triples that
I put in Fuseki is 3161033. Our queries are of search type, for example, given
a search term/phrase get count of results, first 20 results and some facets.
All queries took between 3-10 seconds to execute, which was disappointing.
To be fair, I do not have much knowledge and I have just done basic setup at
this point.
Are there any ways to get a better performance?
Is the data size a problem here? The count of triples is only going to increase.
Can it give better or comparable performance than Neo4J for same data?
Interestingly, free text search returned much earlier than other queries, it
took roughly 1 second.
3. Other Triplestore
What other triplestore can be used if high performance is required along with
ability to do free text search?
-Ajay
On Nov 4, 2015, at 10:10 PM, Andy Seaborne
<[email protected]<mailto:[email protected]>> wrote:
On 04/11/15 16:11, Kamble, Ajay, Crest wrote:
I created text index with this command:
java -cp fuseki-server.jar jena.textindexer --desc=/tmp/fuseki-assembler.ttl
This must be done after you removed tdb:unionDefaultGraph
Then check the place where you have stored the text index (and check there are
not two on your disk - you gave it a relative file name() and see if it has any
data in it.
Andy
-Regards
Ajay
On Nov 4, 2015, at 9:28 PM, Kamble, Ajay, Crest
<[email protected]<mailto:[email protected]><mailto:[email protected]>>
wrote:
Hi Andy,
Thanks for help. My server was able to access data after commenting
‘tdb:unionDefaultGraph’.
But the free text search that I tried did not work. I tried following query but
I got 0 results.
PREFIX text: <http://jena.apache.org/text#>
SELECT ?s
{
?s text:query 'gold' .
}
Is my configuration for text search correct. Also how do I specify 2 datasets
in single service?
Here is snippet from configuration:
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate no:name ]
[ text:field "text" ; text:predicate no:alt-name ]
[ text:field "text" ; text:predicate no:name ]
[ text:field "text" ; text:predicate no:title ]
[ text:field "text" ; text:predicate no:author ]
[ text:field "text" ; text:predicate no:inventor ]
) .
[] rdf:type fuseki:Server ;
# Server-wide context parameters can be given here.
# For example, to set query timeouts: on a server-wide basis:
# Format 1: "1000" -- 1 second timeout
# Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to
for rest of query.
# See java doc for ARQ.queryTimeout
# ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000" ] ;
# Load custom code (rarely needed)
# ja:loadClass "your.code.Class" ;
# Services available. Only explicitly listed services are configured.
# If there is a service description not linked from this list, it is ignored.
fuseki:services (
<#service>
#<#service_text_tdb>
) .
<#service> rdf:type fuseki:Service ;
fuseki:name “mydb" ; # http://host:port/tdb
fuseki:serviceQuery "query" ; # SPARQL query service
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceUpdate "update" ; # SPARQL query service
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol
(read and write)
fuseki:dataset <#dataset> ;
#fuseki:dataset :text_dataset ;
.
-Regards
Ajay
On Nov 4, 2015, at 7:54 PM, Andy Seaborne
<[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>>
wrote:
On 04/11/15 14:11, Kamble, Ajay, Crest wrote:
That worked for me. Also the option is —config and not —conf.
--config and --conf are synomys.
And it's "-" or "--" but not the en-dash or em-dash character your email is
putting in.
Fuseki starts but it does not read my existing data. If I execute simple query
to get count of triples, I get 0. Also, Fuseki gives this warning - Dataset not
found: No session.
Check the config file.
Try without "tdb:unionDefaultGraph true"
If I start Fuseki with —loc option and not —config, then it correctly reads all
data and the same query gives correct count.
--loc is shorthand for TDB only, no text dataset, no default union graph.
Is there anything wrong with the way I have configured dataset in assembler
file?
Also, do I need to create 2 different services for normal sprawl query and text
search?
If the query has no text:query, it executes like a plain SPARQL query on the
TDB datasets.
In other words, can I execute both types of queries in single console or not?
-Regards
Ajay
On Nov 4, 2015, at 7:35 PM, Andy Seaborne
<[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>>
wrote:
On 04/11/15 13:59, Kamble, Ajay, Crest wrote:
Hi Andy,
I tried that but it did not work. I got another error,
fuseki-server --update —conf=/tmp/fuseki-assembler.ttl /mydb
Required: either --config=FILE or one of --mem, --file, --loc or --desc
fuseki-server --conf=/tmp/fuseki-assembler.ttl
The service name is in teh assembler file - you can't give it again on the
command line.
Andy
-Regards
Ajay
On Nov 4, 2015, at 5:43 PM, Andy Seaborne
<[email protected]<mailto:[email protected]><mailto:[email protected]><mailto:[email protected]>>
wrote:
Change "--desc" to "--conf"
"--desc" works in the restricted case when there is one dataset description -
but in this case there are two - the TDB dataset and the test dataset built
over that.
Andy
On 04/11/15 12:10, Kamble, Ajay, Crest wrote:
Hi All,
1. Triplestore
I have an existing Triplestore that I setup by putting data in Fuseki. I used
Java code to put all triples in Fuseki (here is url that I used -
http://localhost:3030/mydb/data). Before starting loading of data I start
Fuseki with this command:
fuseki-server --update --loc=/tmp/fuseki-tdb /mydb
(on Mac OS X).
My database is located at /tmp/fuseki-tdb
This setup works well and I can query all triples from console.
2. Free Text Search
I need to setup free text search on top of this Triplestore, so that normal
Sparql queries and free text queries are both possible.
Here is the assembler file that I used.
@prefix : <http://mydb.com/ns/dataset#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix no: <http://mydb.com/ns/concepts#> .
@prefix d: <http://mydb.com/ns/data#> .
## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
## Initialize text query
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset rdfs:subClassOf ja:RDFDataset .
# Lucene index
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
# Solr index
text:TextIndexSolr rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.
:text_dataset rdf:type text:TextDataset ;
text:dataset <#dataset> ;
text:index <#indexLucene> ;
. I was trying to evaluate Jena+Fuseki for a project. The number of triples
that I put in Fuseki is 3161033. Our queries are of search type, for example,
given a search term/phrase get count of results, first 20 results and some
facets. All queries took between 3-10 seconds to execute, which was
disappointing.
# A TDB datset used for RDF storage
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location “/tmp/fuseki-tdb" ;
tdb:unionDefaultGraph true ; # Optional
.
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
.
# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate no:name ]
[ text:field "text" ; text:predicate no:alt-name ]
[ text:field "text" ; text:predicate no:name ]
[ text:field "text" ; text:predicate no:title ]
[ text:field "text" ; text:predicate no:author ]
[ text:field "text" ; text:predicate no:inventor ]
) .
[] rdf:type fuseki:Server I was trying to evaluate Jena+Fuseki for a project.
The number of triples that I put in Fuseki is 3161033. Our queries are of
search type, for example, given a search term/phrase get count of results,
first 20 results and some facets. All queries took between 3-10 seconds to
execute, which was disappointing. ;
# Server-wide context parameters can be given here.
# For example, to set query timeouts: on a server-wide basis:
# Format 1: "1000" -- 1 second timeout
# Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout to
for rest of query.
# See java doc for ARQ.queryTimeout
# ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "10000" ] ;
# Load custom code (rarely needed)
# ja:loadClass "your.code.Class" ;
# Services available. Only explicitly listed services are configured.
# If there is a service description not linked from this list, it is ignored.
fuseki:services (
<#service>
#<#service_text_tdb>
) .
<#service> rdf:type fuseki:Service ;
fuseki:name “mydb" ; # http://host:port/tdb
fuseki:serviceQuery "query" ; # SPARQL query service
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceUpdate "update" ; # SPARQL query service
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol
(read and write)
fuseki:dataset <#dataset> ;
fuseki:dataset :text_dataset ;
.
With this assembler file, I start my server with following command,
fuseki-server --update
--desc=/Users/kamb16/projects/nano/data/fuseki-assembler.ttl /mydb
I get following error,
com.hp.hpl.jena.sparql.ARQException: Found two matches: var ?root ->
http://mydb.com/ns/dataset#text_dataset,
file:///tmp/fuseki-assembler.ttl#dataset
at com.hp.hpl.jena.sparql.util.QueryExecUtils.getOne(QueryExecUtils.java:360)
at
com.hp.hpl.jena.sparql.util.graph.GraphUtils.findRootByType(GraphUtils.java:194)
at
com.hp.hpl.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:91)
at arq.cmdline.ModAssembler.create(ModAssembler.java:68)
at arq.cmdline.ModDatasetAssembler.createDataset(ModDatasetAssembler.java:43)
at org.apache.jena.fuseki.FusekiCmd.processModulesAndArgs(FusekiCmd.java:307)
at arq.cmdline.CmdArgModule.process(CmdArgModule.java:50)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at org.apache.jena.fuseki.FusekiCmd.main(FusekiCmd.java:166)
I do not understand how to fix this issue. Could you please help? I want to do
regular Sparql queries as well as Free text search.
Regards,
Ajay