FWIW a related PR was just merged that allows to introspect query
execution: https://issues.apache.org/jira/browse/LUCENE-9965. It's
different from your use-case though in that it is debugging information for
a single query rather than statistical information across lots of user
queries (and the ap
Yes, i did those and i believe i am at the best level of performance now
and it is not bad at all but i want to make it much better.
i see like a linear drop in timings when i go lower number of words but
let me do that quick study again.
Fuzzy search is always expensive but that seems to su
I have never used fuzzy search but from the documentation it seems very
expensive, and if you do it on 10 terms and 1M documents it seems very very
very expensive.
Are you using the default 'fuzzyness' parameter? (0.5) - It might end up
exploring a lot of documents, did you try to play with tha
i cant reveal those details i am very sorry. but it is more than 1 million.
let me tell that i have a lot of code that processes results from lucene
but the bottle neck is lucene fuzzy search.
Best regards
On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote:
How many documents do
How many documents do you have in the index?
and can you show an example of query?
From: java-user@lucene.apache.org At: 06/09/21 18:33:25To:
java-user@lucene.apache.org, baris.ka...@oracle.com
Subject: Re: Potential bug
i have only two fields one string the other is a number (stored as
st
i have only two fields one string the other is a number (stored as
string), i guess you cant go simpler than this.
i retreieve the hits and my major bottleneck is lucene fuzzy search.
i take each word from the string which is usually around at most 10 words
i build a fuzzy boolean query out o
Hi Baris,
> what if the user needs to limit the search process?
What do you mean by 'limit'?
> there should be a way to speedup lucene then if this is not possible,
> since for some simple queries it takes half a second which is too long.
What do you mean by 'simple' query? there might be mul
Thanks Adrien, but the differences is too far apart.
I think the algorithm needs to be revised.
what if the user needs to limit the search process?
that leaves no control.
there should be a way to speedup lucene then if this is not possible,
since for some simple queries it takes half a seco
Hi Baris,
totalhitsThreshold is actually a minimum threshold, not a maximum threshold.
The problem is that Lucene cannot directly identify the top matching
documents for a given query. The strategy it adopts is to start collecting
hits naively in doc ID order and to progressively raise the bar ab
Hi,-
i think this is a potential bug
i set this time totalHitsThreshold to 10 and i get totalhits reported as
1655 but i get 10 results in total.
I think this suggests that there might be a bug with
TopScoreDocCollector algorithm.
Best regards
Ok i found it
300 times number of words in the search string but these needs to be
precisely documented in the Javadocs
i dont want to have trial and error and i guess nobody wants that,
either please.
Best regards
On 6/9/21 12:11 PM, baris.ka...@oracle.com wrote:
Hi,-
i used this cl
Hi,-
i used this class now before IndexSearher.search api (with collector
as 2nd arg) (Please see the "an interesting case" thread before this
question)
but this time i have a very weird behavior:
i used to have 4000+ hits with default TopScoreDocCollector.create(int
numHits, ScoreDoc a
Hi,
At my current project we wanted to monitor for a specific field the
fraction of indexed vs doc values queries executed by IndexOrDocValuesQuery.
We ended up forking IndexOrDocValuesQuery and passing a listener that
is notified when the query execution path is decided.
Do you think this is so
13 matches
Mail list logo