Multiword Jena text queries

Mikael Pesonen Mon, 05 Oct 2020 02:23:17 -0700

I'm having trouble making other that one word queries.

For example "language <any other words here>" gives same result,regardless of the other words.


Using quotes "\"some query\"" returns no results.



So I would like to make "fuzzy" multiword queries where for example

"language technology" returns different results  that "language management"

And also to query "\"language technology\"" which should return exactmatches.




I'm using latest Jena with  AnalyzingQueryParser, which gives warning

WARN TextIndexLucene :: Deprecated query parser type'AnalyzingQueryParser'. Defaulting to standard QueryParser


Also tried other parsers.


Config:

@prefix :<http://localhost/jena_example/#>  .
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
@prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
@prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#>  .
@prefix text:<http://jena.apache.org/text#>  .
@prefix skos:<http://www.w3.org/2004/02/skos/core#> .
@prefix fuseki:<http://jena.apache.org/fuseki#>  .
@prefix vcard:<http://www.w3.org/2006/vcard/ns#> .
@prefix dcterms:<http://purl.org/dc/terms/> .

@prefix lsrm:<https://resource.lingsoft.fi/ns/resource_meta#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .


:text_dataset rdf:type     text:TextDataset ;
     text:dataset   :my_dataset ;
     text:index     <#indexLucene> ;
     .

# A TDB dataset used for RDF storage
:my_dataset rdf:type      tdb:DatasetTDB ;
     tdb:location "/home/text/tools/jena_data/" ;
#    tdb:unionDefaultGraph true ; # Optional
     .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
     text:directory <file:/home/text/tools/jena_text_index/> ;
     text:entityMap <#entMap> ;
     text:storeValues true ;
     text:analyzer [ a text:StandardAnalyzer ] ;
     text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
     text:queryParser text:AnalyzingQueryParser ;
     text:multilingualSupport true ;
  .

<#entMap> a text:EntityMap ;
     text:defaultField     "vcard_fn" ;
     text:entityField      "uri" ;
     text:uidField         "uid" ;
     text:langField        "lang" ;
     text:graphField       "graph" ;
     text:map (
          [ text:field "vcard_fn" ; text:predicate vcard:fn ]
          [ text:field "skos_prefLabel"  ; text:predicate skos:prefLabel ]
          [ text:field "skos_altLabel"  ; text:predicate skos:altLabel ]
          [ text:field "lsrm_content" ; text:predicate lsrm:content]
          [ text:field "dcterms_title" ; text:predicate dcterms:title]

[ text:field "dcterms_description" ; text:predicatedcterms:description]

          ) .

<#service> rdf:type fuseki:Service ;
     fuseki:name                     "/ds" ;   # http://host:port/ds-ro
     fuseki:serviceQuery             "query" ;    # SPARQL query service
     fuseki:serviceQuery             "sparql" ;   # SPARQL query service
     fuseki:serviceUpdate            "update" ;   # SPARQL update service

fuseki:serviceUpload "upload" ; # Non-SPARQL uploadservice fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graphstore protocol (read and write)

     fuseki:dataset           :text_dataset ;
     .

Multiword Jena text queries

Reply via email to