I'll have a play with the Lucene functionality and see if I can come up
with a more detailed model of how it could work. I can see there's options
around "hierarchical", "taxonomy index" etc and would like to understand
these better

On Thu, Feb 16, 2023 at 7:47 AM Andy Seaborne <[email protected]> wrote:

>
>
> On 14/02/2023 09:10, Øyvind Gjesdal wrote:
> > Hi,
> >
> > This is also something I've thought about, since we have a dated
> > elasticsearch integration for creating facets from endpoints, and we use
> > aggregated sparql queries for counting which sometimes becomes slow-ish,
> > and has to be turned off for larger datasets.
> >
> > An idea I had around 3 in how it could look, was maybe to extend the the
> > text query syntax with one named variable for facets, which could also
> > contain the results
> > Using the example from from the jena-text documentation:
> >
> > (?s ?score ?literal ?g ?facets) text:query 'word' would add "?facets"
> > optionally to the possible syntax.
>
> Would it be better to introduce text:facetQuery which has inputs and
> outputs specifically for facetted search? An all-purpose property
> function may get unwieldy and user-error prone.
>
> Yes I think so

> The other choice staying within SPARQL 1.1 syntax is SERVICE --
> https://jena.apache.org/documentation/query/service_enhancer.html
> which in effect gives named arguments.
>
> Syntax outside of SPARQL 1.1 syntax is also possible. Having text
> search/faceted search have it's own syntax (the same underlying
> machinery) seems reasonable given how important it is.
>
> > I don't know what the type of the list ?facets (categories and counts)
> > should be, I initially thought it would be nice to have as json, but see
> > that one graph database implements facet results as blank nodes.
>

Which database was this? For my use cases I'd prefer RDF over a micro
format, as there's likely to be subsequent queries based on the results of
the first, and RDF would be easier to parse

> > An option could be just adding an additional parsable string to the
> > text:query extension function, but it is kind of already rich, so I think
> > text:facet is a good idea to not bloat the text:query.
> >
> > There are probably multiple use-cases there as well, such as range,
> > multiple values on same facet, so this idea may end up looking a bit
> > hackish:
> >
> > ?s text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy'
> > 'facets: facet1: "value1", facet1: "value2"; facet2 : ...')
> >
> > I'm very happy to see others also interested in this use case.
> >
> > Best regards,
> > Øyvind
> >
> > On Tue, Feb 14, 2023 at 6:52 AM David Habgood <[email protected]>
> wrote:
> >
> >> Thanks for the link Andy,
> >>
> >> @Elie my specific use case is this:
> >>
> >> I have millions of records with perhaps 100 unique attributes across the
> >> records. Individual records may only have 5-10 attributes though. So a
> user
> >> who wishes to browse the data based on attributes can progressively
> filter
> >> the data to find individual or groups of records. When a user selects a
> >> facet, only those (additional) facets for which records exist are
> displayed
> >> as options, along with counts.
> >>
> >> It is possible with regular SPARQL GROUP BY and COUNT queries but not so
> >> performant.
> >>
> >> Cheers
> >>
> >> On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <[email protected]> wrote:
> >>
> >>>
> >>>
> >>> On 13/02/2023 12:59, David Habgood wrote:
> >>>> Hi Jena Users,
> >>>>
> >>>> I'm interested in extending the Jena Lucene capabilities to include
> >>>> Lucene's faceted search (
> >>>>
> >> https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html
> >>> ).
> >>>
> >>>
> >>>
> >>
> https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html
> >>>
> >>>
> >>>>
> >>>> As far as I can tell from searching the mailing list (and github) the
> >>>> Lucene faceted search capability hasn't been exposed in Jena before.
> >>>>
> >>>> I think it could be exposed as follows:
> >>>> 1. Defining how faceted search concepts can be expressed in the Jena
> >>>> dataset configuration
> >>>> 2. Extending the current indexing code to also generate the facet
> index
> >>>> based on definitions in 1.
> >>>> 3. Adding a new query function for faceted search e.g. text:facet
> >>>>
> >>>> Keen to hear if anyone can see issues with this approach or has other
> >>>> feedback.
> >>>>
> >>>> Thanks
> >>>> David
> >>>>
> >>>
> >>
> >
>

Reply via email to