Re: Monitoring decisions taken by IndexOrDocValuesQuery

2021-06-09 Thread Adrien Grand
FWIW a related PR was just merged that allows to introspect query execution: https://issues.apache.org/jira/browse/LUCENE-9965. It's different from your use-case though in that it is debugging information for a single query rather than statistical information across lots of user queries (and the ap

Re: Potential bug

2021-06-09 Thread baris . kazar
Yes, i did those and i believe i am at the best level of performance now and it is not bad at all but i want to make it much better. i see like a linear drop in timings when i go lower number of words but let me do that quick study again. Fuzzy search  is always expensive but that seems to su

Re: Potential bug

2021-06-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I have never used fuzzy search but from the documentation it seems very expensive, and if you do it on 10 terms and 1M documents it seems very very very expensive. Are you using the default 'fuzzyness' parameter? (0.5) - It might end up exploring a lot of documents, did you try to play with tha

Re: Potential bug

2021-06-09 Thread baris . kazar
i cant reveal those details i am very sorry. but it is more than 1 million. let me tell that i have a lot of code that processes results from lucene but the bottle neck is lucene fuzzy search. Best regards On 6/9/21 1:53 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) wrote: How many documents do

Re: Potential bug

2021-06-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
How many documents do you have in the index? and can you show an example of query? From: java-user@lucene.apache.org At: 06/09/21 18:33:25To: java-user@lucene.apache.org, baris.ka...@oracle.com Subject: Re: Potential bug i have only two fields one string the other is a number (stored as st

Re: Potential bug

2021-06-09 Thread baris . kazar
i have only two fields one string the other is a number (stored as string), i guess you cant go simpler than this. i retreieve the hits and my major bottleneck is lucene fuzzy search. i take each word from the string which is usually around at most 10 words i build a fuzzy boolean query out o

Re: Potential bug

2021-06-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Baris, > what if the user needs to limit the search process? What do you mean by 'limit'? > there should be a way to speedup lucene then if this is not possible, > since for some simple queries it takes half a second which is too long. What do you mean by 'simple' query? there might be mul

Re: Potential bug

2021-06-09 Thread baris . kazar
Thanks Adrien, but the differences is too far apart. I think the algorithm needs to be revised. what if the user needs to limit the search process? that leaves no control. there should be a way to speedup lucene then if this is not possible, since for some simple queries it takes half a seco

Re: Potential bug

2021-06-09 Thread Adrien Grand
Hi Baris, totalhitsThreshold is actually a minimum threshold, not a maximum threshold. The problem is that Lucene cannot directly identify the top matching documents for a given query. The strategy it adopts is to start collecting hits naively in doc ID order and to progressively raise the bar ab

Potential bug

2021-06-09 Thread baris . kazar
Hi,-  i think this is a potential bug i set this time totalHitsThreshold to 10 and i get totalhits reported as 1655 but i get 10 results in total. I think this suggests that there might be a bug with TopScoreDocCollector algorithm. Best regards

Re: TopScoreDocCollector class usage

2021-06-09 Thread baris . kazar
Ok i found it 300 times number of words in the search string but these needs to be precisely documented in the Javadocs i dont want to have trial and error and i guess nobody wants that, either please. Best regards On 6/9/21 12:11 PM, baris.ka...@oracle.com wrote: Hi,-  i used this cl

TopScoreDocCollector class usage

2021-06-09 Thread baris . kazar
Hi,-  i used this class now before IndexSearher.search api (with collector as 2nd arg) (Please see the "an interesting case" thread before this question) but this time i have a very weird behavior: i used to have 4000+ hits with default TopScoreDocCollector.create(int numHits,  ScoreDoc a

Monitoring decisions taken by IndexOrDocValuesQuery

2021-06-09 Thread Egor Moraru
Hi, At my current project we wanted to monitor for a specific field the fraction of indexed vs doc values queries executed by IndexOrDocValuesQuery. We ended up forking IndexOrDocValuesQuery and passing a listener that is notified when the query execution path is decided. Do you think this is so