Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Yang Yuanzhe Sun, 14 Jun 2015 15:15:07 -0700

Hi Andy,

Thank you for reminding me the mailing issue. I am very sorry for theinconvenience I am causing. I didn't find the reason why this happens. Itested it and the address for sending seems correct. Anyway, I have sentanother subscription request with the other address according to yoursuggestion.


Thank you again and have a nice day.

Regards,
Yang

On 06/08/2015 04:46 PM, Andy Seaborne wrote:

On 08/06/15 14:34, Yang wrote:

Hi Andy,

Thank you very much for the suggestion. In-memory TDB dataset works
properly.

As for the 500 error in loading, maybe you didn't notice my
explanation about it. It emerges on 2.0.0 only when an in-memory
dataset is used with text search enabled. I reported this error to
you in March and it is fixed later on in a snapshot. Now in the
latest snapshot loading is working, but Lucene does not index any
more.

Something different is happening because the text indexing code wasmade more integrated with transactions and a general purpose datasetis not properly transactional - it's can combine graphs with differentstorages.

Anyway, while using in-memory TDB for the moment, we are looking
forward to your solution (or even a new release) for it. Thank you in
advance for your efforts and have a nice day.


JENA-956 has already been fixed.
https://issues.apache.org/jira/browse/JENA-956


Regards, Yang

PS: I am working behind some firewalls so sometimes I can't send out
emails. :D


So far, I've had to manually let through your emails.  Please could you
register properly.

You are registered as [email protected] but sending from[email protected].


To subscribe a specific address use "users-subscribe-ID=HOST@..."

[email protected]

    Andy



On 06/05/2015 12:32 PM, Andy Seaborne wrote:

I've logged this as JENA-956 (with details).  The work-round is to
use an in-memory TDB dataset.

tdb:location "--mem--" ;

[2015-06-03 12:10:47] HttpAction WARN Exception during abort
(operation attempts to continue): Can't abort a write
lock-transaction [2015-06-03 12:10:47] Fuseki INFO [7] 500 Server
Error (523 ms)


You loaded the data twice I guess.

Andy


PS Your email address [email protected]
<mailto:[email protected]> does not always work.

Reporting-MTA: dns; mailrelay118.isp.belgacom.be

Final-Recipient: rfc822;[email protected]
<mailto:rfc822;[email protected]> Action: failed Status: 5.0.0
(permanent failure) Remote-MTA: dns; [91.183.52.144]
Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-'5.4.0
Error: too many hops' (delivery attempts: 0)



On 05/06/15 09:17, Yang wrote:

Hi Andy,

I am sorry for such a late response. We were busy on another
project during this period. Now I try to explain how I reproduce
the error step by step. I did send you an email to the mailing
list yesterday, however it never shows up. So I would like to
give another trial today. My apologies for possible duplicates.

So the problem is there is something wrong in the search indexing
for in-memory datasets. Here is the configuration file I used, it
should be basic enough: a server description, a service
description and an index engine associating to the dataset to
index "rdfs:label".

@prefix : <#> . @prefix fuseki:
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#> . @prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text:
<http://jena.apache.org/text#> <http://jena.apache.org/text#>
<http://jena.apache.org/text#> <http://jena.apache.org/text#>
. @prefix spatial: <http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#> . [] a fuseki:Server ;
fuseki:services ( <#memory> ) . <#memory> a fuseki:Service ;
fuseki:name "memory" ; fuseki:serviceQuery "sparql" ;
fuseki:serviceQuery "query" ; fuseki:serviceUpdate "update" ; #
SPARQL query service – /memory/update fuseki:serviceUpload
"upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceReadGraphStore "get" ; # Graph store protocol
(read only) – /memory/get fuseki:dataset :text_dataset ; .
<#dataset> rdf:type ja:RDFDataset ; ja:defaultGraph [ a
ja:MemoryModel ; ] . Text [] ja:loadClass
"org.apache.jena.query.text.TextQuery" . text:TextDataset
rdfs:subClassOf ja:RDFDataset . text:TextIndexLucene
rdfs:subClassOf text:TextIndex . :text_dataset a
text:TextDataset ; text:dataset <#dataset> ; text:index
<#textIndexLucene> ; . Text index description
<#textIndexLucene> a text:TextIndexLucene ; text:directory
<file:Lucene> <file://Lucene> <file://Lucene> <file://lucene/>
; ##text:directory "mem" ; text:entityMap <#entMap> ; .
<#entMap> a text:EntityMap ; text:entityField "uri" ;
text:defaultField "text" ; text:map ( [ text:field "text" ;
text:predicate rdfs:label ] ) .


The server is started with

"./fuseki-server --config=config-memory-text.ttl"


and console says it starts properly:

[2015-06-03 12:13:09] Server INFO Fuseki 2.0.1-SNAPSHOT
2015-05-05T12:48:09+0000 [2015-06-03 12:13:09] Config INFO
FUSEKI_HOME=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT

[2015-06-03 12:13:09] Config INFOFUSEKI_BASE=/home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run

[2015-06-03 12:13:09] Servlet INFO Initializing Shiro
environment [2015-06-03 12:13:09] Config INFO Shiro file:
file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini<file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini><file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini><file:///home/yyz/Downloads/apache-jena-fuseki-2.0.1-SNAPSHOT/run/shiro.ini>

[2015-06-03 12:13:09] Config INFO Configuration file:config-memory-text.ttl

[2015-06-03 12:13:10] Builder INFO Service: :memory [2015-06-03
12:13:11] Config INFO Register: /memory [2015-06-03 12:13:11]
Server INFO Started 2015/06/03 12:13:11 CEST on port 3030


I tested it in two versions: the official release 2.0.0 and the
latest snapshot 2.0.1-SNAPSHOT 2015-05-05T12:48:09+0000. The
phenomenons are as follows:

In 2.0.0: If I load some triples not containing "rdfs:label",
everything works properly. However in this case the index engine
is not working; then as long as I add one triple for "rdfs:label"
into the file I am loading to Fuseki, error emerges:

[2015-06-03 12:10:47] Fuseki INFO [7] Filename: licenties.ttl,
Content-Type=application/octet-stream, Charset=null => Turtle :
Count=40 Triples=40 Quads=0 [2015-06-03 12:10:47] HttpAction
WARN Exception during abort (operation attempts to continue):
Can't abort a write lock-transaction [2015-06-03 12:10:47]
Fuseki INFO [7] 500 Server Error (523 ms)


I remember that a few months ago when 2.0.0 was released for the
first time, I discovered this issue and reported to you. But at
that time I didn't realize that the root reason was because of
indexing. In a later snapshot you fix it, but my test wasn't
proper so I thought the problem is solved and gave you a wrong
feedback. My sincere apologizes.

In 2.0.1 SNAPSHOT: The latest snapshot contains the patch I
mentioned above so they can be successfully loaded. However they
are not indexed at all. Queries with keyword search do not return
any result. Following your advice, I tested loading and query
from both Web UI and s-post/s-query tools, unfortunately (or
fortunately?) the consequences are the same.

TDB: Meanwhile, a similar experiment on Fuseki with TDB in 2.0.0
and 2.0.1 SNAPSHOT is also performed, they both works properly.
Loadings are successful and queries returns search results. The
only difference is in the configuration file the in-memory
dataset is replaced with TDB.

@prefix : <#> . @prefix fuseki:
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#> . @prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs:
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix text:
<http://jena.apache.org/text#> <http://jena.apache.org/text#>
<http://jena.apache.org/text#> <http://jena.apache.org/text#>
. [] rdf:type fuseki:Server ; fuseki:services (
<#service_text_tdb> ) . TDB [] ja:loadClass
"com.hp.hpl.jena.tdb.TDB" . tdb:DatasetTDB rdfs:subClassOf
ja:RDFDataset . tdb:GraphTDB rdfs:subClassOf ja:Model . Text []
ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
<#service_text_tdb> a fuseki:Service ; rdfs:label "TDB/text
service" ; fuseki:name "tdb" ; fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ; fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ; fuseki:serviceReadGraphStore
"get" ; fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset <#text_dataset> ; . <#text_dataset> a
text:TextDataset ; text:dataset <#dataset> ; text:index
<#indexLucene> ; . <#dataset> a tdb:DatasetTDB ; tdb:location
"DB" ; ##tdb:unionDefaultGraph true ; . <#indexLucene> a
text:TextIndexLucene ; text:directory <file:Lucene>
<file://Lucene> <file://Lucene> <file://lucene/> ;
##text:directory "mem" ; text:entityMap <#entMap> ; . <#entMap>
a text:EntityMap ; text:entityField "uri" ; text:defaultField
"text" ; text:map ( [ text:field "text" ; text:predicate
rdfs:label ] ) .


Any advice for it now? Thank you very much for your efforts in
advance.

Regards, Yang

PS: I discovered that there is a SNAPSHOT for 2.3.0. I planned to
test on it as well. However I wasn't able to run it.

On 04/17/2015 05:29 PM, Yang Yuanzhe wrote:

Hi Andy,

Thank you very much for your reply.

In fact the problem is irrelevant to the preloaded triples. It
won't work no matter if we start an empty or preloaded one.
Moreover, it takes around 1 minute to load 38k triples, while
TDB only needs 6 seconds. If we turn off text search for an
in-memory dataset, the loading speed rushed to only 1 second.
That's why I thought problem is from Fuseki side.

As for TDB with reasoning, I don't agree with your opinion that
the dataset is not attached to a text index. We have defined
the dataset:

<#tdb_inf_ds> a ja:RDFDataset ; ja:defaultGraph
<#tdb_inf> ; .

We tell Lucene to index it:

:text_dataset a text:TextDataset ; text:dataset
<#tdb_inf_ds> ; text:index <#textIndexLucene> ; .

And we assert that the dataset includes an RDFS inference
model:

<#tdb_inf> a ja:InfModel ; rdfs:label "RDFS Inference Model"
; ja:baseModel <#tdb_graph> ; ja:reasoner [ ja:reasonerURL
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner>
<http://jena.hpl.hp.com/2003/RDFSExptRuleReasoner> ] .


Then both text search and RDFS reasoning should work. Such
configuration works properly in Fuseki 1.1.1. However things
changed in 1.1.2 and 2.0.x. I don't know what I should do to
adjust to the new system.

Thank you very much for your efforts again and have a nice
day.

Regards, Yang


On 04/17/2015 02:53 PM, Andy Seaborne wrote:

On 14/04/15 18:51, Yang Yuanzhe wrote:

Hi there,

Sorry to trouble you again. Last month I wrote to you to
figure out the bug in text search for TDB. Given the
following configuration, text search works with TDB:

...

Comments inline:

Now we want to use text search for in-memory datasets, but
we failed after some trials, the configuration file we use
is as follows:

@prefix :        <#> . @prefix fuseki:
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#>
<http://jena.apache.org/fuseki#> . @prefix rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix
rdfs: <http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#>
<http://www.w3.org/2000/01/rdf-schema#> . @prefix tdb:
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#>
<http://jena.hpl.hp.com/2008/tdb#> . @prefix ja:
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#>
<http://jena.hpl.hp.com/2005/11/Assembler#> . @prefix
text:    <http://jena.apache.org/text#>
<http://jena.apache.org/text#>
<http://jena.apache.org/text#>
<http://jena.apache.org/text#> . @prefix spatial:
<http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#>
<http://jena.apache.org/spatial#> .

[] a fuseki:Server ; fuseki:services ( <#memory> ) .

<#memory> a fuseki:Service ; fuseki:name
"memory" ; fuseki:serviceQuery             "sparql" ;
fuseki:serviceQuery             "query" ;
fuseki:serviceUpdate            "update" ;   # SPARQL
query service -- /memory/update fuseki:serviceUpload
"upload" ;   # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore      "data" ;
fuseki:serviceReadGraphStore       "get" ;   # Graph
store protocol (read only) -- /memory/get fuseki:dataset
:text_dataset ; .

<#dataset> rdf:type ja:RDFDataset ; ja:defaultGraph [ a
ja:MemoryModel ; ja:content [ja:externalContent
<file:dcat-vl.ttl> <file://dcat-vl.ttl>
<file://dcat-vl.ttl> <file://dcat-vl.ttl/> ] ; ] .


That is going to load the data each time the server starts
but does not attach it anyway to the text index.

Is it the same data as is loaded (separately) into the text
index?

Similarly for the inference setup (which is in a different
Lucene index file:Text <file://Text> <file://Text>
<file://text/>) ...

Andy


# Text [] ja:loadClass
"org.apache.jena.query.text.TextQuery" . text:TextDataset
rdfs:subClassOf   ja:RDFDataset . text:TextIndexLucene
rdfs:subClassOf   text:TextIndex .

:text_dataset a text:TextDataset ; text:dataset
<#dataset> ; text:index <#textIndexLucene> ; .

# Text index description <#textIndexLucene> a
text:TextIndexLucene ; text:directory <file:Lucene>
<file://Lucene> <file://Lucene> <file://lucene/> ;
##text:directory "mem" ; text:entityMap <#entMap> ; .

<#entMap> a text:EntityMap ; text:entityField      "uri"
; text:defaultField     "text" ; text:map ( [ text:field
"text" ; text:predicate rdfs:label ] ) .

...


All the tests are based on the 2.0.1 SNAPSHOT built on
April 8th. Any clue or any suggestion for this issue? Thank
you very much and have a nice day.

Regards, Yang

Re: Unable to enable text search in Fuseki 2 for in-memory datasets

Reply via email to