On 14/02/2023 09:10, Øyvind Gjesdal wrote:
Hi,

This is also something I've thought about, since we have a dated
elasticsearch integration for creating facets from endpoints, and we use
aggregated sparql queries for counting which sometimes becomes slow-ish,
and has to be turned off for larger datasets.

An idea I had around 3 in how it could look, was maybe to extend the the
text query syntax with one named variable for facets, which could also
contain the results
Using the example from from the jena-text documentation:

(?s ?score ?literal ?g ?facets) text:query 'word' would add "?facets"
optionally to the possible syntax.

Would it be better to introduce text:facetQuery which has inputs and outputs specifically for facetted search? An all-purpose property function may get unwieldy and user-error prone.

The other choice staying within SPARQL 1.1 syntax is SERVICE --
https://jena.apache.org/documentation/query/service_enhancer.html
which in effect gives named arguments.

Syntax outside of SPARQL 1.1 syntax is also possible. Having text search/faceted search have it's own syntax (the same underlying machinery) seems reasonable given how important it is.

I don't know what the type of the list ?facets (categories and counts)
should be, I initially thought it would be nice to have as json, but see
that one graph database implements facet results as blank nodes.
An option could be just adding an additional parsable string to the
text:query extension function, but it is kind of already rich, so I think
text:facet is a good idea to not bloat the text:query.

There are probably multiple use-cases there as well, such as range,
multiple values on same facet, so this idea may end up looking a bit
hackish:

?s text:query ( property* 'query string' limit 'lang:xx' 'highlight:yy'
'facets: facet1: "value1", facet1: "value2"; facet2 : ...')

I'm very happy to see others also interested in this use case.

Best regards,
Øyvind

On Tue, Feb 14, 2023 at 6:52 AM David Habgood <dcchabg...@gmail.com> wrote:

Thanks for the link Andy,

@Elie my specific use case is this:

I have millions of records with perhaps 100 unique attributes across the
records. Individual records may only have 5-10 attributes though. So a user
who wishes to browse the data based on attributes can progressively filter
the data to find individual or groups of records. When a user selects a
facet, only those (additional) facets for which records exist are displayed
as options, along with counts.

It is possible with regular SPARQL GROUP BY and COUNT queries but not so
performant.

Cheers

On Tue, Feb 14, 2023 at 2:58 AM Andy Seaborne <a...@apache.org> wrote:



On 13/02/2023 12:59, David Habgood wrote:
Hi Jena Users,

I'm interested in extending the Jena Lucene capabilities to include
Lucene's faceted search (

https://javadoc.io/doc/org.apache.lucene/lucene-facet/latest/index.html
).



https://lucene.apache.org/core/9_5_0/demo/org/apache/lucene/demo/facet/package-summary.html



As far as I can tell from searching the mailing list (and github) the
Lucene faceted search capability hasn't been exposed in Jena before.

I think it could be exposed as follows:
1. Defining how faceted search concepts can be expressed in the Jena
dataset configuration
2. Extending the current indexing code to also generate the facet index
based on definitions in 1.
3. Adding a new query function for faceted search e.g. text:facet

Keen to hear if anyone can see issues with this approach or has other
feedback.

Thanks
David




Reply via email to