Re: Processing query clause combinations at indexing time

2020-12-15 Thread Robert Muir
You can look at IndexSearcher.setQueryCache etc for more details. Especially LRUQueryCache. Maybe we should celebrate a little bit if its already 80% of the way there for your use-case, but at the same time, perhaps defaults could be better. There is a lot going on here, for example decisions abou

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Michael Froh
We don't handle positional queries in our use-case, but that's just because we don't happen to have many positional queries. But if we identify documents at indexing time that contain a given phrase/slop/etc. query, then we can tag the documents with a term that indicates that (or, more likely, tag

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Robert Muir
What are you doing with positional queries though? And how does the scoring work (it is unclear from your previous reply to me whether you are scoring). Lucene has filter caching too, so if you are doing this for non-scoring cases maybe something is off? On Tue, Dec 15, 2020 at 3:19 PM Michael Fr

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Michael Froh
It's conceptually similar to CommonGrams in the single-field case, though it doesn't require terms to appear in any particular positions. It's also able to match across fields, which is where we get a lot of benefit. We have frequently-occurring filters that get added by various front-end layers b

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Michael Froh
Huh... I didn't know about Luwak / the monitoring module. I spent some time this morning going through it. It takes a very different approach to matching at indexing time versus what we did, and looks more powerful. Given that document-matching is one of the harder steps in the process, I'm quite h

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Robert Muir
I wonder if it can be done in a fairly clean way. This sounds similar to using a ShingleFilter to do this optimization, but adding some conditionals so that the index is smaller? Now that we have ConditionalTokenFilter (for branching), can the feature be implemented cleanly? Ideally it wouldn't re

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Robert Muir
See also commongrams which is a very similar concept: https://github.com/apache/lucene-solr/tree/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/commongrams On Tue, Dec 15, 2020 at 12:08 PM Robert Muir wrote: > > I wonder if it can be done in a fairly clean way. This sounds simi

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Atri Sharma
In that case, I would be interested to know if this can be merged into Luwak. On Tue, 15 Dec 2020, 21:50 Adrien Grand, wrote: > I like this idea. I can think of several users who have a priori knowledge > of frequently used filters and would appreciate having Lucene take care of > transparently

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Adrien Grand
I like this idea. I can think of several users who have a priori knowledge of frequently used filters and would appreciate having Lucene take care of transparently optimizing the execution of such filters instead of having to do it manually. I'm not sure a separate project is the best option, it m

Re: Processing query clause combinations at indexing time

2020-12-15 Thread Michael Sokolov
I feel like there could be some considerable overlap with features provided by Luwak, which was contributed to Lucene fairly recently, and I think does the query inversion work required for this; maybe more of it already exists here? I don't know if that module handles the query rewriting, or the t

Re: Processing query clause combinations at indexing time

2020-12-14 Thread Atri Sharma
+1 I would suggest that this be an independent project hosted on Github (there have been similar projects in the past that have seen success that way) On Tue, 15 Dec 2020, 09:37 David Smiley, wrote: > Great optimization! > > I'm dubious on it being a good contribution to Lucene itself however,

Re: Processing query clause combinations at indexing time

2020-12-14 Thread David Smiley
Great optimization! I'm dubious on it being a good contribution to Lucene itself however, because what you propose fits cleanly above Lucene. Even at a ES/Solr layer (which I know you don't use, but hypothetically speaking), I'm dubious there as well. ~ David Smiley Apache Lucene/Solr Search Dev

Processing query clause combinations at indexing time

2020-12-14 Thread Michael Froh
My team at work has a neat feature that we've built on top of Lucene that has provided a substantial (20%+) increase in maximum qps and some reduction in query latency. Basically, we run a training process that looks at historical queries to find frequently co-occurring combinations of required cl