Re: How to do text search with Jena and Fuseki

Andy Seaborne Wed, 04 Nov 2015 08:42:07 -0800

On 04/11/15 16:11, Kamble, Ajay, Crest wrote:

I created text index with this command:


java -cp fuseki-server.jar jena.textindexer --desc=/tmp/fuseki-assembler.ttl


This must be done after you removed tdb:unionDefaultGraph

Then check the place where you have stored the text index (and checkthere are not two on your disk - you gave it a relative file name() andsee if it has any data in it.



        Andy


-Regards
Ajay

On Nov 4, 2015, at 9:28 PM, Kamble, Ajay, Crest 
<[email protected]<mailto:[email protected]>> wrote:

Hi Andy,

Thanks for help. My server was able to access data after commenting 
‘tdb:unionDefaultGraph’.

But the free text search that I tried did not work. I tried following query but 
I got 0 results.

PREFIX text: <http://jena.apache.org/text#>

SELECT ?s
{
    ?s text:query 'gold' .
}

Is my configuration for text search correct. Also how do I specify 2 datasets 
in single service?

Here is snippet from configuration:

# Text index description
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate no:name ]
         [ text:field "text" ; text:predicate no:alt-name ]
         [ text:field "text" ; text:predicate no:name ]
         [ text:field "text" ; text:predicate no:title ]
         [ text:field "text" ; text:predicate no:author ]
         [ text:field "text" ; text:predicate no:inventor ]
         ) .

[] rdf:type fuseki:Server ;
   # Server-wide context parameters can be given here.
   # For example, to set query timeouts: on a server-wide basis:
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
to for rest of query.
   # See java doc for ARQ.queryTimeout
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;

   # Load custom code (rarely needed)
   # ja:loadClass "your.code.Class" ;

   # Services available.  Only explicitly listed services are configured.
   #  If there is a service description not linked from this list, it is 
ignored.
   fuseki:services (
     <#service>
     #<#service_text_tdb>
   ) .

<#service>  rdf:type fuseki:Service ;
    fuseki:name              “mydb" ;       # http://host:port/tdb
    fuseki:serviceQuery               "query" ;    # SPARQL query service
    fuseki:serviceQuery               "sparql" ;   # SPARQL query service
    fuseki:serviceUpdate              "update" ;   # SPARQL query service
    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store 
protocol (read and write)
    fuseki:dataset           <#dataset> ;
    #fuseki:dataset                  :text_dataset ;
.

-Regards
Ajay

On Nov 4, 2015, at 7:54 PM, Andy Seaborne 
<[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote:

On 04/11/15 14:11, Kamble, Ajay, Crest wrote:
That worked for me. Also the option is —config and not —conf.

--config and --conf are synomys.

And it's "-" or "--" but not the en-dash or em-dash character your email is 
putting in.


Fuseki starts but it does not read my existing data. If I execute simple query 
to get count of triples, I get 0. Also, Fuseki gives this warning - Dataset not 
found: No session.

Check the config file.

Try without "tdb:unionDefaultGraph true"


If I start Fuseki with —loc option and not —config, then it correctly reads all 
data and the same query gives correct count.

--loc is shorthand for TDB only, no text dataset, no default union graph.


Is there anything wrong with the way I have configured dataset in assembler 
file?

Also, do I need to create 2 different services for normal sprawl query and text 
search?

If the query has no text:query, it executes like a plain SPARQL query on the 
TDB datasets.

In other words, can I execute both types of queries in single console or not?

-Regards
Ajay


On Nov 4, 2015, at 7:35 PM, Andy Seaborne 
<[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote:

On 04/11/15 13:59, Kamble, Ajay, Crest wrote:
Hi Andy,

I tried that but it did not work. I got another error,

fuseki-server --update —conf=/tmp/fuseki-assembler.ttl /mydb
Required: either --config=FILE or one of --mem, --file, --loc or --desc

fuseki-server --conf=/tmp/fuseki-assembler.ttl

The service name is in teh assembler file - you can't give it again on the 
command line.

Andy


-Regards
Ajay

On Nov 4, 2015, at 5:43 PM, Andy Seaborne 
<[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote:

Change "--desc" to "--conf"

"--desc" works in the restricted case when there is one dataset description - 
but in this case there are two - the TDB dataset and the test dataset built over that.

Andy

On 04/11/15 12:10, Kamble, Ajay, Crest wrote:
Hi All,

1. Triplestore

I have an existing Triplestore that I setup by putting data in Fuseki. I used 
Java code to put all triples in Fuseki (here is url that I used - 
http://localhost:3030/mydb/data). Before starting loading of data I start 
Fuseki with this command:

fuseki-server --update --loc=/tmp/fuseki-tdb /mydb
(on Mac OS X).

My database is located at /tmp/fuseki-tdb

This setup works well and I can query all triples from console.

2. Free Text Search

I need to setup free text search on top of this Triplestore, so that normal 
Sparql queries and free text queries are both possible.

Here is the assembler file that I used.

@prefix :        <http://mydb.com/ns/dataset#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix no: <http://mydb.com/ns/concepts#> .
@prefix d: <http://mydb.com/ns/data#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
# Solr index
text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#indexLucene> ;
    .

# A TDB datset used for RDF storage
<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location “/tmp/fuseki-tdb" ;
    tdb:unionDefaultGraph true ; # Optional
    .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate no:name ]
         [ text:field "text" ; text:predicate no:alt-name ]
         [ text:field "text" ; text:predicate no:name ]
         [ text:field "text" ; text:predicate no:title ]
         [ text:field "text" ; text:predicate no:author ]
         [ text:field "text" ; text:predicate no:inventor ]
         ) .

[] rdf:type fuseki:Server ;
   # Server-wide context parameters can be given here.
   # For example, to set query timeouts: on a server-wide basis:
   # Format 1: "1000" -- 1 second timeout
   # Format 2: "10000,60000" -- 10s timeout to first result, then 60s timeout 
to for rest of query.
   # See java doc for ARQ.queryTimeout
   # ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "10000" ] ;

   # Load custom code (rarely needed)
   # ja:loadClass "your.code.Class" ;

   # Services available.  Only explicitly listed services are configured.
   #  If there is a service description not linked from this list, it is 
ignored.
   fuseki:services (
     <#service>
     #<#service_text_tdb>
   ) .

<#service>  rdf:type fuseki:Service ;
    fuseki:name              “mydb" ;       # http://host:port/tdb
    fuseki:serviceQuery               "query" ;    # SPARQL query service
    fuseki:serviceQuery               "sparql" ;   # SPARQL query service
    fuseki:serviceUpdate              "update" ;   # SPARQL query service
    fuseki:serviceUpload              "upload" ;   # Non-SPARQL upload service
    fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store 
protocol (read and write)
    fuseki:dataset           <#dataset> ;
    fuseki:dataset                  :text_dataset ;
.

With this assembler file, I start my server with following command,

fuseki-server --update 
--desc=/Users/kamb16/projects/nano/data/fuseki-assembler.ttl /mydb

I get following error,

com.hp.hpl.jena.sparql.ARQException: Found two matches: var ?root -> 
http://mydb.com/ns/dataset#text_dataset, file:///tmp/fuseki-assembler.ttl#dataset
at com.hp.hpl.jena.sparql.util.QueryExecUtils.getOne(QueryExecUtils.java:360)
at 
com.hp.hpl.jena.sparql.util.graph.GraphUtils.findRootByType(GraphUtils.java:194)
at 
com.hp.hpl.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:91)
at arq.cmdline.ModAssembler.create(ModAssembler.java:68)
at arq.cmdline.ModDatasetAssembler.createDataset(ModDatasetAssembler.java:43)
at org.apache.jena.fuseki.FusekiCmd.processModulesAndArgs(FusekiCmd.java:307)
at arq.cmdline.CmdArgModule.process(CmdArgModule.java:50)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at org.apache.jena.fuseki.FusekiCmd.main(FusekiCmd.java:166)

I do not understand how to fix this issue. Could you please help? I want to do 
regular Sparql queries as well as Free text search.

Regards,
Ajay

Re: How to do text search with Jena and Fuseki

Reply via email to