Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Andy Seaborne Sun, 19 Apr 2015 03:49:35 -0700

On 17/04/15 16:29, Yang Yuanzhe wrote:

Hi Andy,


Thank you very much for your reply.

In fact the problem is irrelevant to the preloaded triples. It won't
work no matter if we start an empty or preloaded one. Moreover, it takes
around 1 minute to load 38k triples, while TDB only needs 6 seconds. If
we turn off text search for an in-memory dataset, the loading speed
rushed to only 1 second. That's why I thought problem is from Fuseki side.

As for TDB with reasoning, I don't agree with your opinion that the
dataset is not attached to a text index.

In your configuration, I can see no loading of the test index which is afile based index.


[[
<#dataset> rdf:type ja:RDFDataset ;
    ja:defaultGraph
          [
            a ja:MemoryModel ;
            ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
          ] .
]]

does not put any information into the text index; it finds the defaultgraph of the underlying dataset, not the text dataset, and loads thefile. At this point, the text index has not been touched.

The current description is useful but isn't enough for me to reproducethe situation.

Please could you provide a complete, minimal example for just the textindexing case?

i.e. Something I can use at my end without having to do anything notdescribed.

If it is changes between 1.1.1 and 1.1.2, lets' stick to those twoversions. For such as system:


1/ A configuration, as short as possible to illustrate the situation.

Ideally, in-memory, including the text index, is cleaner because thenour tests are repeated each time the exampel is run.


2/ How to start the server

3/ Actions needed to load data

Using the s-put, s-post etc tools or wget/curl to load the data if itcomes from the web side; a small datafile if preloaded when the serverstarts.


4/ The query being made.
   What happens?
   Is failed an error status code or silence?

        Andy

We have defined the dataset:

<#tdb_inf_ds> a ja:RDFDataset ;
    ja:defaultGraph       <#tdb_inf> ;
    .

We tell Lucene to index it:

:text_dataset a text:TextDataset ;
    text:dataset   <#tdb_inf_ds> ;
    text:index     <#textIndexLucene> ;
    .

And we assert that the dataset includes an RDFS inference model:

<#tdb_inf> a ja:InfModel ;
    rdfs:label "RDFS Inference Model" ;
    ja:baseModel <#tdb_graph> ;
    ja:reasoner
         [ ja:reasonerURL
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ]
    .


Then both text search and RDFS reasoning should work. Such configuration
works properly in Fuseki 1.1.1. However things changed in 1.1.2 and
2.0.x. I don't know what I should do to adjust to the new system.

Thank you very much for your efforts again and have a nice day.

Regards,
Yang


On 04/17/2015 02:53 PM, Andy Seaborne wrote:

On 14/04/15 18:51, Yang Yuanzhe wrote:

Hi there,

Sorry to trouble you again. Last month I wrote to you to figure out the
bug in text search for TDB. Given the following configuration, text
search works with TDB:

...

Comments inline:

Now we want to use text search for in-memory datasets, but we failed
after some trials, the configuration file we use is as follows:

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix spatial:    <http://jena.apache.org/spatial#> .

[] a fuseki:Server ;
   fuseki:services (
     <#memory>
   ) .

<#memory> a fuseki:Service ;
    fuseki:name                     "memory" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceUpdate            "update" ;   # SPARQL query
service -- /memory/update
    fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
service
    fuseki:serviceReadWriteGraphStore      "data" ;
    fuseki:serviceReadGraphStore       "get" ;   # Graph store
protocol (read only) -- /memory/get
    fuseki:dataset           :text_dataset ;
    .

<#dataset> rdf:type ja:RDFDataset ;
    ja:defaultGraph
          [
            a ja:MemoryModel ;
            ja:content [ja:externalContent <file:dcat-vl.ttl> ] ;
          ] .


That is going to load the data each time the server starts but does
not attach it anyway to the text index.

Is it the same data as is loaded (separately) into the text index?

Similarly for the inference setup (which is in a different Lucene
index file:Text) ...

    Andy


# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

:text_dataset a text:TextDataset ;
    text:dataset   <#dataset> ;
    text:index     <#textIndexLucene> ;
    .

# Text index description
<#textIndexLucene> a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    .

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ; text:predicate rdfs:label ]
         ) .

...


All the tests are based on the 2.0.1 SNAPSHOT built on April 8th. Any
clue or any suggestion for this issue? Thank you very much and have a
nice day.

Regards,
Yang

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Reply via email to