Re: Question about ImpactsDISI for boolean queries

2025-04-21 Thread Alfonsi, Peter
Hi Adrien, Thanks for the quick reply! This makes sense. I think BlockMaxConjunctionBulkScorer actually never calls setMinCompetitiveScore() at all, so there's no hope of skipping, while ConjunctionScorer does in the case that there's only one scorer (which happens when we move the range query

Re: Question about ImpactsDISI for boolean queries

2025-04-21 Thread Adrien Grand
You are on the right track. It's easier to skip by score when there is a single scoring clause than when the score is the sum of the scores of two clauses. Well, actually in this case two clauses are not much harder since one of the clauses gives the same score to all documents, but the conjunctio

Question about ImpactsDISI for boolean queries

2025-04-21 Thread Alfonsi, Peter
Hello, I’ve been working on optimizing boolean queries in OpenSearch, using Lucene 10.1. For these tests I’ve been using our http_logs benchmark, which has a date field called “@timestamp”, an integer field called “status”, and a text field called “request”. I noticed there’s sometimes a large

Re: Synonyms and searching

2025-04-21 Thread Anh Dũng Bùi
In my work, I usually use Automaton to convert "http proxy" or "http-proxy" into "httpproxy" which is more storage-efficient than a synonym. If you also want to search by "http" or "proxy" alone, then one way would be to extend from CompoundWordTokenFilterBase and break httpproxy into "http proxy"