I'll have a play with the Lucene functionality and see if I can come up with a more detailed model of how it could work. I can see there's options around "hierarchical", "taxonomy index" etc and would like to understand these better
On Thu, Feb 16, 2023 at 7:47 AM Andy Seaborne <[email protected]> wrote: > > > On 14/02/2023 09:10, Øyvind Gjesdal wrote: > > Hi, > > > > This is also something I've thought about, since we have a dated > > elasticsearch integration for creating facets from endpoints, and we use > > aggregated sparql queries for counting which sometimes becomes slow-ish, > > and has to be turned off for larger datasets. > > > > An idea I had around 3 in how it could look, was maybe to extend the the > > text query syntax with one named variable for facets, which could also > > contain the results > > Using the example from from the jena-text documentation: > > > > (?s ?score ?literal ?g ?facets) text:query 'word' would add "?facets" > > optionally to the possible syntax. > > Would it be better to introduce text:facetQuery which has inputs and > outputs specifically for facetted search? An all-purpose property > function may get unwieldy and user-error prone. > > Yes I think so > The other choice staying within SPARQL 1.1 syntax is SERVICE -- > https://jena.apache.org/documentation/query/service_enhancer.html > which in effect gives named arguments. > > Syntax outside of SPARQL 1.1 syntax is also possible. Having text > search/faceted search have it's own syntax (the same underlying > machinery) seems reasonable given how important it is. > > > I don't know what the type of the list ?facets (categories and counts) > > should be, I initially thought it would be nice to have as json, but see > > that one graph database implements facet results as blank nodes. > Which database was this? For my use cases I'd prefer RDF over a micro format, as there's likely to be subsequent queries based on the results of the first, and RDF would be easier to parse > > An option could be just adding an additional parsable string to the > > text:query extension function, but it is kind of already rich, so I think > > text:facet is a good idea to not bloat the text:query. > > > > There are probably multiple use-cases there as well, such as range, > > multiple values on same facet, so this idea may end up looking a bit > > hackish: > > > > ?s text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy' > > 'facets: facet1: "value1", facet1: "value2"; facet2 : ...') > > > > I'm very happy to see others also interested in this use case. > > > > Best regards, > > Øyvind > > > > On Tue, Feb 14, 2023 at 6:52 AM David Habgood <[email protected]> > wrote: > > > >> Thanks for the link Andy, > >> > >> @Elie my specific use case is this: > >> > >> I have millions of records with perhaps 100 unique attributes across the > >> records. Individual records may only have 5-10 attributes though. So a > user > >> who wishes to browse the data based on attributes can progressively > filter > >> the data to find individual or groups of records. When a user selects a > >> facet, only those (additional) facets for which records exist are > displayed > >> as options, along with counts. > >> > >> It is possible with regular SPARQL GROUP BY and COUNT queries but not so > >> performant. > >> > >> Cheers > >> > >> On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <[email protected]> wrote: > >> > >>> > >>> > >>> On 13/02/2023 12:59, David Habgood wrote: > >>>> Hi Jena Users, > >>>> > >>>> I'm interested in extending the Jena Lucene capabilities to include > >>>> Lucene's faceted search ( > >>>> > >> https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html > >>> ). > >>> > >>> > >>> > >> > https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html > >>> > >>> > >>>> > >>>> As far as I can tell from searching the mailing list (and github) the > >>>> Lucene faceted search capability hasn't been exposed in Jena before. > >>>> > >>>> I think it could be exposed as follows: > >>>> 1. Defining how faceted search concepts can be expressed in the Jena > >>>> dataset configuration > >>>> 2. Extending the current indexing code to also generate the facet > index > >>>> based on definitions in 1. > >>>> 3. Adding a new query function for faceted search e.g. text:facet > >>>> > >>>> Keen to hear if anyone can see issues with this approach or has other > >>>> feedback. > >>>> > >>>> Thanks > >>>> David > >>>> > >>> > >> > > >
