Re: Multiword Jena text queries

Mikael Pesonen Thu, 29 Oct 2020 06:18:02 -0700

Sorry for the delay. I took your simplier config as example and now I'mgetting better results.

So "some AND word" returns first the results having both words but thenalso results having only one of the words.

Trying

_exists_:Some AND _exists_:Word

results an error

14:53:12 INFO Fuseki :: [6] 500org.apache.jena.query.text.TextIndexParseException: Text search parse error:Cannot parse 'lsrm_lmz_title:_exists_:agentum and _exists_:lasku ':Encountered " ":" ": "" at line 1, column 23.


Is it possible to make queries where all the words have to exist?

Config:

...
<#entMap_test> a text:EntityMap ;
    text:defaultField     "lsrm_lmz_title" ;
    text:entityField      "uri" ;
    text:uidField         "uid" ;
    text:langField        "lang" ;
    text:graphField       "graph" ;
    text:map (
     [ text:field "lsrm_lmz_title" ; text:predicate lsrm:lmz_title]
     ) .
...

Br,
Mikael


On 08/10/2020 13.55, Øyvind Gjesdal wrote:

I have a working setup (on fuseki 3.14) where I can see different results
using AND/OR, and where "~" fuzzy operator also works.

Differences from your config seem to be that I haven't configured much for
the index, only the directory and entity map. Would testing if a minimal
config works and then rebuilding index from command line, with more
configuration each time until it breaks help?

     <#text_index> a text:TextIndexLucene ;    text:directory
</var/fuseki/databases/place-name-data/Lucene> ;    text:entityMap
<#entMap> ;    .

Best regards,

Øyvind



tor. 8. okt. 2020 kl. 10:23 skrev Mikael Pesonen <[email protected]

:
Anyone got any idea how to fix this? I'm out of ideas.

On Mon, 5 Oct 2020 at 14:33, Mikael Pesonen <[email protected]>
wrote:

Sorry, correction: "language AND <any other words here>" and "language
OR <any other words here>" return same results as "language <any other
words here>" and same results as "language".

On 5.10.2020 14:27, Mikael Pesonen wrote:

Hi,

forgot to mention that AND and OR in query returns also no results.
I'm somewhat familiar with Lucene syntax but seems like none of the
syntax works with my setup.
There are no errors in Jena log, only the warning about
AnalyzingQueryParser.



On 5.10.2020 13:49, Lorenz Buehmann wrote:

It's Lucene syntax so a look into its documentation[1] could help.

Regarding multiple words, default Boolean operator is "OR", i.e.

"language <any other words here>" is equivalent to "language OR <any
other words here>". Obviously the result will contain all at least
documents with "language". Use AND operator if it must contain both.

Fuzzy queries and proximity queries are also explained in the Lucene
docs[1].



[1] https://lucene.apache.org/core/8_6_2/queryparser/index.html

On 05.10.20 11:22, Mikael Pesonen wrote:

I'm having trouble making other that one word queries.

For example "language <any other words here>" gives same result,
regardless of the other words.

Using quotes "\"some query\"" returns no results.



So I would like to make "fuzzy" multiword queries where for example

"language technology" returns different results  that "language
management"

And also to query "\"language technology\"" which should return exact
matches.



I'm using latest Jena with  AnalyzingQueryParser, which gives warning

   WARN  TextIndexLucene :: Deprecated query parser type
'AnalyzingQueryParser'. Defaulting to standard QueryParser

Also tried other parsers.


Config:

@prefix :<http://localhost/jena_example/#>  .
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
@prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
@prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:<http://jena.apache.org/text#>  .
@prefix skos:<http://www.w3.org/2004/02/skos/core#> .
@prefix fuseki:<http://jena.apache.org/fuseki#>  .
@prefix vcard:<http://www.w3.org/2006/vcard/ns#> .
@prefix dcterms:<http://purl.org/dc/terms/> .

@prefix lsrm:<https://resource.lingsoft.fi/ns/resource_meta#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .


:text_dataset rdf:type     text:TextDataset ;
       text:dataset   :my_dataset ;
       text:index     <#indexLucene> ;
       .

# A TDB dataset used for RDF storage
:my_dataset rdf:type      tdb:DatasetTDB ;
       tdb:location "/home/text/tools/jena_data/" ;
#    tdb:unionDefaultGraph true ; # Optional
       .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
       text:directory <file:/home/text/tools/jena_text_index/> ;
       text:entityMap <#entMap> ;
       text:storeValues true ;
       text:analyzer [ a text:StandardAnalyzer ] ;
       text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
       text:queryParser text:AnalyzingQueryParser ;
       text:multilingualSupport true ;
    .

<#entMap> a text:EntityMap ;
       text:defaultField     "vcard_fn" ;
       text:entityField      "uri" ;
       text:uidField         "uid" ;
       text:langField        "lang" ;
       text:graphField       "graph" ;
       text:map (
            [ text:field "vcard_fn" ; text:predicate vcard:fn ]
            [ text:field "skos_prefLabel"  ; text:predicate
skos:prefLabel ]
            [ text:field "skos_altLabel"  ; text:predicate
skos:altLabel ]
            [ text:field "lsrm_content" ; text:predicate lsrm:content]
            [ text:field "dcterms_title" ; text:predicate

dcterms:title]

            [ text:field "dcterms_description" ; text:predicate
dcterms:description]
            ) .

<#service> rdf:type fuseki:Service ;
       fuseki:name                     "/ds" ;   #
http://host:port/ds-ro
       fuseki:serviceQuery             "query" ;    # SPARQL query
service
       fuseki:serviceQuery             "sparql" ;   # SPARQL query
service
       fuseki:serviceUpdate            "update" ;   # SPARQL update
service
       fuseki:serviceUpload            "upload" ;   # Non-SPARQL

upload

service
       fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph
store protocol (read and write)
       fuseki:dataset           :text_dataset ;
       .

Re: Multiword Jena text queries

Reply via email to