Re: Multiword Jena text queries

Mikael Pesonen Thu, 08 Oct 2020 01:23:24 -0700

Anyone got any idea how to fix this? I'm out of ideas.

On Mon, 5 Oct 2020 at 14:33, Mikael Pesonen <[email protected]>
wrote:


>
> Sorry, correction: "language AND <any other words here>" and "language
> OR <any other words here>" return same results as "language <any other
> words here>" and same results as "language".
>
> On 5.10.2020 14:27, Mikael Pesonen wrote:
> >
> > Hi,
> >
> > forgot to mention that AND and OR in query returns also no results.
> > I'm somewhat familiar with Lucene syntax but seems like none of the
> > syntax works with my setup.
> > There are no errors in Jena log, only the warning about
> > AnalyzingQueryParser.
> >
> >
> >
> > On 5.10.2020 13:49, Lorenz Buehmann wrote:
> >> It's Lucene syntax so a look into its documentation[1] could help.
> >>
> >> Regarding multiple words, default Boolean operator is "OR", i.e.
> >>
> >> "language <any other words here>" is equivalent to "language OR <any
> >> other words here>". Obviously the result will contain all at least
> >> documents with "language". Use AND operator if it must contain both.
> >>
> >> Fuzzy queries and proximity queries are also explained in the Lucene
> >> docs[1].
> >>
> >>
> >>
> >> [1] https://lucene.apache.org/core/8_6_2/queryparser/index.html
> >>
> >> On 05.10.20 11:22, Mikael Pesonen wrote:
> >>> I'm having trouble making other that one word queries.
> >>>
> >>> For example "language <any other words here>" gives same result,
> >>> regardless of the other words.
> >>>
> >>> Using quotes "\"some query\"" returns no results.
> >>>
> >>>
> >>>
> >>> So I would like to make "fuzzy" multiword queries where for example
> >>>
> >>> "language technology" returns different results  that "language
> >>> management"
> >>>
> >>> And also to query "\"language technology\"" which should return exact
> >>> matches.
> >>>
> >>>
> >>>
> >>> I'm using latest Jena with  AnalyzingQueryParser, which gives warning
> >>>
> >>>   WARN  TextIndexLucene :: Deprecated query parser type
> >>> 'AnalyzingQueryParser'. Defaulting to standard QueryParser
> >>>
> >>> Also tried other parsers.
> >>>
> >>>
> >>> Config:
> >>>
> >>> @prefix :<http://localhost/jena_example/#>  .
> >>> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
> >>> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> >>> @prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
> >>> @prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#> .
> >>> @prefix text:<http://jena.apache.org/text#>  .
> >>> @prefix skos:<http://www.w3.org/2004/02/skos/core#> .
> >>> @prefix fuseki:<http://jena.apache.org/fuseki#>  .
> >>> @prefix vcard:<http://www.w3.org/2006/vcard/ns#> .
> >>> @prefix dcterms:<http://purl.org/dc/terms/> .
> >>>
> >>> @prefix lsrm:<https://resource.lingsoft.fi/ns/resource_meta#> .
> >>>
> >>> ## Example of a TDB dataset and text index
> >>> ## Initialize TDB
> >>> [] ja:loadClass "org.apache.jena.tdb.TDB" .
> >>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> >>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
> >>>
> >>> ## Initialize text query
> >>> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
> >>> # A TextDataset is a regular dataset with a text index.
> >>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
> >>> # Lucene index
> >>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
> >>>
> >>>
> >>> :text_dataset rdf:type     text:TextDataset ;
> >>>       text:dataset   :my_dataset ;
> >>>       text:index     <#indexLucene> ;
> >>>       .
> >>>
> >>> # A TDB dataset used for RDF storage
> >>> :my_dataset rdf:type      tdb:DatasetTDB ;
> >>>       tdb:location "/home/text/tools/jena_data/" ;
> >>> #    tdb:unionDefaultGraph true ; # Optional
> >>>       .
> >>>
> >>> # Text index description
> >>> <#indexLucene> a text:TextIndexLucene ;
> >>>       text:directory <file:/home/text/tools/jena_text_index/> ;
> >>>       text:entityMap <#entMap> ;
> >>>       text:storeValues true ;
> >>>       text:analyzer [ a text:StandardAnalyzer ] ;
> >>>       text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
> >>>       text:queryParser text:AnalyzingQueryParser ;
> >>>       text:multilingualSupport true ;
> >>>    .
> >>>
> >>> <#entMap> a text:EntityMap ;
> >>>       text:defaultField     "vcard_fn" ;
> >>>       text:entityField      "uri" ;
> >>>       text:uidField         "uid" ;
> >>>       text:langField        "lang" ;
> >>>       text:graphField       "graph" ;
> >>>       text:map (
> >>>            [ text:field "vcard_fn" ; text:predicate vcard:fn ]
> >>>            [ text:field "skos_prefLabel"  ; text:predicate
> >>> skos:prefLabel ]
> >>>            [ text:field "skos_altLabel"  ; text:predicate
> >>> skos:altLabel ]
> >>>            [ text:field "lsrm_content" ; text:predicate lsrm:content]
> >>>            [ text:field "dcterms_title" ; text:predicate dcterms:title]
> >>>            [ text:field "dcterms_description" ; text:predicate
> >>> dcterms:description]
> >>>            ) .
> >>>
> >>> <#service> rdf:type fuseki:Service ;
> >>>       fuseki:name                     "/ds" ;   #
> >>> http://host:port/ds-ro
> >>>       fuseki:serviceQuery             "query" ;    # SPARQL query
> >>> service
> >>>       fuseki:serviceQuery             "sparql" ;   # SPARQL query
> >>> service
> >>>       fuseki:serviceUpdate            "update" ;   # SPARQL update
> >>> service
> >>>       fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload
> >>> service
> >>>       fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph
> >>> store protocol (read and write)
> >>>       fuseki:dataset           :text_dataset ;
> >>>       .
> >
>
>

Re: Multiword Jena text queries

Reply via email to