FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer)

2025-07-08 Thread Singh, Divya
From: Singh, Divya Sent: 04 July 2025 14:40 To: d...@lucene.apache.org Cc: Birajdar, Sharad (DI SW PLM LCS APPS ALM R&D7) Subject: FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer) From: Thakare, Monika (ext) (DI SW PLM LCS APPS A

Re: Point query on a LatLonPoint field

2025-06-09 Thread Ignacio Vera
I tried to move the API toward this direction ( https://github.com/apache/lucene/issues/10194) but I got pushed back. On Mon, Jun 9, 2025 at 8:22 PM Tomás Fernández Löbbe wrote: > Thanks a lot Ignacio, > This does seem to work. I'm wondering why this is not part of the query &

Re: Point query on a LatLonPoint field

2025-06-09 Thread Tomás Fernández Löbbe
Thanks a lot Ignacio, This does seem to work. I'm wondering why this is not part of the query processing itself? Are there situations in which someone would not want this behavior? Tomas On Mon, Jun 9, 2025 at 11:02 AM Ignacio Vera wrote: > That is actually expected as the query is t

Re: Point query on a LatLonPoint field

2025-06-09 Thread Ignacio Vera
That is actually expected as the query is trying to match the original point with the encoded point in the index, therefore is not matching. There are other cases where results are not as expected, for example if you index the points from a polygon and then you make a polygon query using that

Point query on a LatLonPoint field

2025-06-06 Thread Tomás Fernández Löbbe
Hi All, I just noticed that If I add a document with a LatLonPoint field with a latitude and longitude, and then I do a query using "LatLonPoint.newGeometryQuery("location", ShapeField.QueryRelation.INTERSECTS, new Point(latitude, longitude))" with the same latitude and long

Re: Query on SoftUpdateDocument API

2025-02-21 Thread Adrien Grand
Hi Abhishek, Actually softUpdate is about doing an update where the deletion is performed via a soft delete rather than a hard delete. To perform doc-value updates, you need to use the updateNumericDocValue or updateBinaryDocValue APIs. Note that it doesn't actually update in-place, it needs to

Query on SoftUpdateDocument API

2025-02-21 Thread Abhishek Chandran
Hello, Lucene community! I am reaching out to you to seek help with one of the updated APIs. I have a use case where I need to update the Lucene document's docValue fields in place, i.e., without creating a new document with every update. I believe softUpdateDocument should help me achieve this,

Re: Looking for resources to understand query cost/complexity

2025-02-21 Thread Adrien Grand
This depends on many factors, but in my experience these two are good starting points: - Total number of matching docs of the query. - Number of segments times number of terms being looked up. This is a simplified model, some queries incur their own costs, e.g. phrase queries bottleneck on

Looking for resources to understand query cost/complexity

2025-02-20 Thread Phil Budne
always have a date range for publication dates to consider. 2. They almost always have a list of source domains to consider (currently expressed as a query string domain:[this OR that OR ...]) 3. They almost always have a user query string (sometimes omitted to get the overall number

Re: Error Doc id doesn't match the query in vector searches

2025-01-17 Thread Varun Thacker
>> I misspoke, for regular search KnnFloatVectorQuery is the Query object >> before the rewrite. After the >> rewrite it's AbstractKnnVectorQuery$DocAndScoreQuery >> >> And then when Solr asks for the score the same Query object is passed to >> the rewrite and be

Re: Error Doc id doesn't match the query in vector searches

2025-01-17 Thread Varun Thacker
I was able to narrow down to https://github.com/apache/solr/commit/cfec121bab2ecfc4c06e20a5533596025ae63d98 that causes this issue. Without that change the bug doesn't repro On Thu, Jan 16, 2025 at 6:49 PM Varun Thacker wrote: > I misspoke, for regular search KnnFloatVectorQuery is t

Re: Error Doc id doesn't match the query in vector searches

2025-01-16 Thread Varun Thacker
I misspoke, for regular search KnnFloatVectorQuery is the Query object before the rewrite. After the rewrite it's AbstractKnnVectorQuery$DocAndScoreQuery And then when Solr asks for the score the same Query object is passed to the rewrite and becomes a AbstractKnnVectorQuery$DocAndScore

Re: Error Doc id doesn't match the query in vector searches

2025-01-16 Thread Varun Thacker
I'll have to recreate my setup again since I tried re-building solr without some PRs and it wiped everything out(my mistake!) I was able to get the query Solr sends for search KnnFloatVectorQuery vs what it uses for getting the score {AbstractKnnVectorQuery$DocAndScoreQuery. This might give

Re: Error Doc id doesn't match the query in vector searches

2025-01-16 Thread Varun Thacker
I have an index where I can repro it with 100% success. Let me look into what's causing it and create a Solr Jira On Mon, Oct 21, 2024 at 11:11 AM Michael Sokolov wrote: > I think this might be a better question for solr-user@? EG I don't > understand how Solr decides which

Re: Custom Query Implementation

2025-01-03 Thread Viacheslav Dobrynin
Hi, Thank you! пт, 3 янв. 2025 г. в 14:15, Uwe Schindler : > Hi, > > the expressions query should not be slower. Of course, if you also take > the compilation into the query time measurement it may be little slower > due to compilation and optimizing. In general queries s

Re: Custom Query Implementation

2025-01-03 Thread Uwe Schindler
Hi, the expressions query should not be slower. Of course, if you also take the compilation into the query time measurement it may be little slower due to compilation and optimizing. In general queries should be warmed before measuring them + expressions should only be compiled once and

Re: Lucene Query Metrics

2024-12-04 Thread Mikhail Khludnev
k execution > stats on top of Lucene. > > On Tue, 3 Dec 2024 at 23:20, Adrien Grand wrote: > > > Lucene doesn't expose query metrics, it's up to the application that > > integrates Lucene to compute and expose metrics that are relevant to > them. > > >

Re: Lucene Query Metrics

2024-12-04 Thread ashwini singh
Does lucene provide extensions (utilities)to extract metrics from Lucene during the request execution? Or applications can only track execution stats on top of Lucene. On Tue, 3 Dec 2024 at 23:20, Adrien Grand wrote: > Lucene doesn't expose query metrics, it's up to the app

Re: Lucene Query Metrics

2024-12-03 Thread Adrien Grand
Lucene doesn't expose query metrics, it's up to the application that integrates Lucene to compute and expose metrics that are relevant to them. Le mer. 4 déc. 2024, 00:31, ashwini singh a écrit : > Hey everyone, > > Does lucene provide any query metrics (perf) ? I am lo

Lucene Query Metrics

2024-12-03 Thread ashwini singh
Hey everyone, Does lucene provide any query metrics (perf) ? I am looking for something very similar to MongoSB explain() output or Execution metrics for Cosmos DB? *Thanks and Regards,* *Ashwini Singh*

Re: Custom Query Implementation

2024-12-03 Thread Viacheslav Dobrynin
Hi, Thanks for the answers! Yes, my task is to store only non-zero values from a sparse vector of large dimension, where most of the elements are zero. вт, 3 дек. 2024 г. в 19:17, Mikhail Khludnev : > Thanks for clarification Michael! > > On Tue, Dec 3, 2024 at 1:56 PM Michael Sokolov wrote: >

Re: Custom Query Implementation

2024-12-03 Thread Mikhail Khludnev
Thanks for clarification Michael! On Tue, Dec 3, 2024 at 1:56 PM Michael Sokolov wrote: > Sparse is meaning two different things here. In the case you found Mikhail, > it means not every document has a value for some vector field. I think the > question here is about very high dimensional vector

Re: Custom Query Implementation

2024-12-03 Thread Michael Sokolov
Sparse is meaning two different things here. In the case you found Mikhail, it means not every document has a value for some vector field. I think the question here is about very high dimensional vectors where most documents have zeroes in most dimensions of the vector. On Tue, Dec 3, 2024, 2:01 A

Re: Custom Query Implementation

2024-12-02 Thread Mikhail Khludnev
Morning. I noticed a condition choosing sparse and dense format underneath https://github.com/apache/lucene/blob/6053e1e31378378f6d310a05ea6d7dcdfc45f48b/lucene/core/src/java/org/apache/lucene/codecs/lucene95/OffHeapByteVectorValues.java#L108 perhaps it may achieve your performance requirements.

Re: Custom Query Implementation

2024-12-02 Thread Viacheslav Dobrynin
Hi, Thanks for the answer! I think this is similar to my initial implementation, where I built the query as follows (PyLucene): def build_query(query): builder = BooleanQuery.Builder() for term in torch.nonzero(query): field_name = to_field_name(term.item()) value = query

Re: Custom Query Implementation

2024-12-02 Thread Michael Sokolov
BooleanQuery from the terms in the sparse search vector and use a simple similarity that sums the term frequencies for ranking. As long as the number of non-zero dimensions in the query is low, this should be efficient On Mon, Dec 2, 2024 at 1:17 PM Viacheslav Dobrynin wrote: > > Hi, > > Th

Re: Custom Query Implementation

2024-12-02 Thread Viacheslav Dobrynin
Hi, Thanks for the reply. I haven't tried to do that. However, I do not fully understand how in this case an inverted index will be constructed for an efficient search by terms (O(1) for each term as a key )? пн, 2 дек. 2024 г. в 21:55, Patrick Zhai : > Hi, have you tried to encode the sparse v

Re: Custom Query Implementation

2024-12-02 Thread Patrick Zhai
Hi, have you tried to encode the sparse vector yourself using the BinaryDocValueField? One way I can think of is to encode it as (size, index_array, value_array) per doc Intuitively I feel like this should be more efficient than one dimension per field if your dimension is high enough Patrick On

Re: Custom Query Implementation

2024-12-02 Thread Viacheslav Dobrynin
Hi! I need to index sparse vectors, whereas as I understand it, KnnFloatVectorField is designed for dense vectors. Therefore, it seems that this approach will not work. вс, 1 дек. 2024 г. в 18:36, Mikhail Khludnev : > Hi, > May it look like KnnFloatVectorField(... DOT_PRODUCT) > and KnnFloatVect

Re: Custom Query Implementation

2024-12-01 Thread Mikhail Khludnev
Hi, May it look like KnnFloatVectorField(... DOT_PRODUCT) and KnnFloatVectorQuery?

Re: Custom Query Implementation

2024-12-01 Thread Viacheslav Dobrynin
Hi! Thank you for your reply! I tried the recommendations, and below I gave an example code for implementing queries. The query with the expression works a little slower, I think this is due to the need for compilation. I have one more question, please tell me which type of field is best suited

Re: Custom Query Implementation

2024-11-30 Thread Mikhail Khludnev
s in this vector. A score between a query and a document is located > as a dot product between their vectors. > To do this, I am building the following documents using PyLucene: > doc = Document() > doc.add(StringField("doc_id", str(doc_id), Field.Store.YES)) > doc.add(F

Custom Query Implementation

2024-11-30 Thread Viacheslav Dobrynin
document is encoded into a sparse vector, where the terms are the positions in this vector. A score between a query and a document is located as a dot product between their vectors. To do this, I am building the following documents using PyLucene: doc = Document() doc.add(StringField("doc_id&

Re: Lucene9.11 has longer query warm up time for vector queries compared to lucene9.7

2024-10-30 Thread Rui Wu
m Lucene9.7 to Lucene9.11. During the migration, > we noticed that the Lucene9.11 has a longer warm up time for vector > queries. The warm up time means: when the index just finishes building, the > query time is high for the first few minutes. > > The following figure shows the issue.

Lucene9.11 has longer query warm up time for vector queries compared to lucene9.7

2024-10-30 Thread Rui Wu
Dear users, We recently migrated from Lucene9.7 to Lucene9.11. During the migration, we noticed that the Lucene9.11 has a longer warm up time for vector queries. The warm up time means: when the index just finishes building, the query time is high for the first few minutes. The following figure

Re: Error Doc id doesn't match the query in vector searches

2024-10-21 Thread Michael Sokolov
I think this might be a better question for solr-user@? EG I don't understand how Solr decides which Query to send to populateScores -- is it the same one that was used to generate the matches in topDocs? It seems as if it should be, but then this error shouldn't happen ... I wonder

Error Doc id doesn't match the query in vector searches

2024-10-17 Thread Moll, Dr. Andreas
res(TopFieldCollector.java:478) java.lang.IllegalArgumentException: Doc id 48567944 doesn't match the query at org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:478) ~[?:?] at org.apache.solr.search.SolrIndexSearcher.populateScoresIfNeeded(SolrIndexSearcher.j

RE: priority of query results with text alternates

2024-10-06 Thread Trevor Nicholls
. Thanks T From: Ralf Heyde Sent: Sunday, 6 October 2024 19:27 To: java-user@lucene.apache.org Subject: Re: priority of query results with text alternates Hey, In case I have such an issue, i usually tend to use more than one field with different analyzer setups and weight

Re: priority of query results with text alternates

2024-10-05 Thread Ralf Heyde
Hey,In case I have such an issue, i usually tend to use more than one field with different analyzer setups and weight/multiply score them individually for each field (index / query).BoostQuery (Lucene 9.11.0 core API)lucene.apache.orgThat may solve it. CheersVon meinem Telefon gesendet, etwaige

priority of query results with text alternates

2024-10-05 Thread Trevor Nicholls
eplicate them so that the token stream output by the analyzer contains both [app.] [server-] [file_] [name] and [app] [server] [file] [name] with all the correct offsets. The same analyzer is applied both to the indexed content and to the search terms. This w

Unexpected boolean behaviour in multi field query

2024-06-20 Thread Tim Whittington
I've got a problem with a query running against Lucene 7.3 where the boolean AND is not being applied. The fields involved are: 1. _missing_, which contains a token for each field missing in the document 2. phoneNationalNumberQueryNgrams, which contains a phone number. The index analyz

KnnFloatVectorQuery: filtering query & rewrite

2024-05-15 Thread Marc Davenport
Hello, I'm exploring some personalization to our sort orders. If I have an original query q which is mostly just a set of term filters, and I want to sort those by distance between some float vector on the document and a supplied user vector. I only see one way to do this. I would create

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Mikhail Khludnev
Hello. What if you examine parsed query via debugger or just print query.toString() for both and compare? On Mon, May 6, 2024 at 1:42 PM Saha, Rajib wrote: > Hi Experts, > > > > As per the definition in > https://lucene.apache.org/core/2_9_4/queryparsersyntax.html > &

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht
d 'NOT' in query string stands for same reason theoretically. [cid:image001.png@01DA9FCF.A22DAD00] [cid:image002.png@01DA9FD0.1DB4C0D0] But, in practical, is there any difference? Why I am asking the question. In our product, we have got an incident related to different result set f

Re: Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Paul Libbrecht
Do I mistake or “ “ makes an OR if there’s no other? On 6 May 2024, at 12:41, Saha, Rajib wrote: Hi Experts, As per the definition in https://lucene.apache.org/core/2_9_4/queryparsersyntax.html '-' and 'NOT' in query string stands for same reason theoretically.

Difference between '-' and 'NOT' in Lucene Query.

2024-05-06 Thread Saha, Rajib
Hi Experts, As per the definition in https://lucene.apache.org/core/2_9_4/queryparsersyntax.html '-' and 'NOT' in query string stands for same reason theoretically. [cid:image001.png@01DA9FCF.A22DAD00] [cid:image002.png@01DA9FD0.1DB4C0D0] But, in practical, is there any d

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Puneeth Bikkumanla
Thanks Adrien! On Fri, Apr 12, 2024 at 9:49 AM Adrien Grand wrote: > You are correct, query rewriting is not affected by the use of search vs. > searchAfter. > > On Fri, Apr 12, 2024 at 3:37 PM Puneeth Bikkumanla > wrote: > > > Hello, > > Sorry I should

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Adrien Grand
You are correct, query rewriting is not affected by the use of search vs. searchAfter. On Fri, Apr 12, 2024 at 3:37 PM Puneeth Bikkumanla wrote: > Hello, > Sorry I should have clarified what I meant by “optimized”. I am familiar > with the collector/comparators using the “after” doc

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Puneeth Bikkumanla
Hello, Sorry I should have clarified what I meant by “optimized”. I am familiar with the collector/comparators using the “after” doc to filter out documents but I specifically was talking about the query rewriting phase. Is the query rewritten differently in search vs searchAfter? Looking at the

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Adrien Grand
: > Hello, > I was wondering if a user-defined Query is optimized the same way in both > search/searchAfter provided the index stays the same (no CRUD takes place). > > In searchAfter we pass in an "after" doc so I was wondering if that changes > how a query is optimized at all.

Query Optimization in search/searchAfter

2024-04-11 Thread Puneeth Bikkumanla
Hello, I was wondering if a user-defined Query is optimized the same way in both search/searchAfter provided the index stays the same (no CRUD takes place). In searchAfter we pass in an "after" doc so I was wondering if that changes how a query is optimized at all. By looking at the

Re: Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread Mikhail Khludnev
Hello Igor, Accessing a potential match during scoring is problematic. Piggibacking on wrapped boolean query is not an option because when a certain docId is collected matching legs might reside on previous or next docID. You can check these for the inspiration https://lucene.apache.org/core/9_0_0

Re: Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread Igor Kustov
TopDocs topDocs = this.searcher.search(query, maxResults); > Weight weight = > query.rewrite(this.searcher.getIndexReader()).createWeight(this.searcher, > ScoreMode.TOP_DOCS, 1.0f); > for (ScoreDoc scoreDoc : topDocs.scoreDocs) { > Matches matches = > weight.matches(this.searche

AW: Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread nedyalko.zhe...@freelance.de.INVALID
Hi Igor, I have similar situation and have written the following code: TopDocs topDocs = this.searcher.search(query, maxResults); Weight weight = query.rewrite(this.searcher.getIndexReader()).createWeight(this.searcher, ScoreMode.TOP_DOCS, 1.0f); for (ScoreDoc scoreDoc : topDocs.scoreDocs

Access child boolean query matched terms in parent custom wrapper query

2023-07-17 Thread Igor Kustov
I'm writing custom lucene query which is basically a wrapper around boolean query with many should clauses. I want to access this boolean query's matched terms, and then either filter out this document depending on external statistics on those terms or proceed with this document without

Re: LuceneTestCase altered the default query cache policy

2023-06-27 Thread Michael McCandless
less http://blog.mikemccandless.com On Mon, Jun 26, 2023 at 5:26 PM Yuan Xiao wrote: > Hello community, > > I am a developer that work for Amazon Product Search. Recently I have > experienced a use scenario that I altered the IndexSearcher’s default query > cache policy but so

LuceneTestCase altered the default query cache policy

2023-06-26 Thread Yuan Xiao
Hello community, I am a developer that work for Amazon Product Search. Recently I have experienced a use scenario that I altered the IndexSearcher’s default query cache policy but some of our unit tests that extended to LuceneTestCase failed. It take me some time to figure it out

RE: Can I simplify this bit of query boosting?

2023-05-14 Thread Trevor Nicholls
Can I simplify this bit of query boosting? You might also want to have a look at FeatureField. This can be used to associate a score with a particular term. On Thu, May 11, 2023 at 1:13 PM Hrvoje Lončar wrote: > > I had a situation when i wanted to sort a list of articles based on > the

Re: Can I simplify this bit of query boosting?

2023-05-11 Thread Michael Sokolov
le having a photo, description, > ingredients should perform better comparing to one having only name and > photo. > For that purpose I created a numeric field that holds calculated value > named completeness. Later when executing a query, this number is used as a > sort modifi

Re: Can I simplify this bit of query boosting?

2023-05-11 Thread Hrvoje Lončar
named completeness. Later when executing a query, this number is used as a sort modifier - in my case by using reverse order. My project is based on Hibernate Search, so I guess it's not that I can put here a code snippet. This numeric value does not have to be 1st sort modifier. First you pu

Can I simplify this bit of query boosting?

2023-05-11 Thread Trevor Nicholls
process is here: // have IndexReader reader, IndexSearcher searcher, Analyzer analyzer, String userQuery QueryParser parser = new QueryParser( "content", analyzer ); parser.setDefaultOperator( QueryParserBase.AND_OPERATOR ); BooleanQuery query = new BooleanQuery.

RE: Highlighting query results, my method is too crude, but how to improve it?

2023-02-21 Thread Trevor Nicholls
Thank you David, very useful cheers T -Original Message- From: Dawid Weiss Sent: Tuesday, February 21, 2023 7:17 PM To: java-user@lucene.apache.org Subject: Re: Highlighting query results, my method is too crude, but how to improve it? You can use two different queries - the query is

Re: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Dawid Weiss
You can use two different queries - the query is just used as a source of information on what to highlight (it can even be completely different and unrelated to the query that retrieved the documents). Separately, unified highlighter is great but you may also try the matches API - I found it to

RE: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Trevor Nicholls
Well I don't know; I suppose that's part of my question. It's not immediately obvious to me that the "query" in these two lines: Highlighter highlighter = new Highlighter( htmlFormatter, new QueryScorer( query )); TopDocs results = searcher.search( query, max );

Re: Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Mikhail Khludnev
Hello, Maybe I'm missing some point. But, can you highlight another query than one you search for? On Mon, Feb 20, 2023 at 5:07 PM Trevor Nicholls wrote: > Sorry I apologize for this being a bit long and for explaining the problem > at the very bottom after all the background,

Highlighting query results, my method is too crude, but how to improve it?

2023-02-20 Thread Trevor Nicholls
d the index has stored several fields per document: category, volume, title, text, etc. Title and text are tokenised and stored, all other fields are just indexed. When searching the index I am using the standard queryparser, and a typical query might look like "(title:graph AND titl

Suggest field query with fst off heap

2022-10-18 Thread akmithu91
Hi , I posted my question here also would link it to the list if we have an answer, https://stackoverflow.com/questions/74114422/suggestfield-in-lucene-using-onheap-when-searching . Is there a way to query using suggestsearcher an off heap suggest field. When I tried it always gives out of

Re: Lucene's LRU Query Cache - Deep Dive

2022-09-01 Thread Mohammad Sadiq
twice for the > same segment, which is a costly operation. > > 2. The cost is computed in order to know whether the top-level query is > likely to consume a large-enough portion of the matches of the query that > we are considering caching so that caching this query wouldn't hu

Re: Lucene's LRU Query Cache - Deep Dive

2022-07-21 Thread Shradha Shankar
sorted by > numeric field can leverage point index structures to only look at a small > subset of the matching docs. Yet caching a query requires consuming all its > matches, so it could significantly hurt latencies. It's important to not > cache all queries to preserve the benef

Re: Lucene's LRU Query Cache - Deep Dive

2022-07-19 Thread Adrien Grand
1. I believe that this would require pulling a ScorerSupplier twice for the same segment, which is a costly operation. 2. The cost is computed in order to know whether the top-level query is likely to consume a large-enough portion of the matches of the query that we are considering caching so

Re: Lucene's LRU Query Cache - Deep Dive

2022-07-14 Thread Mohammad Sadiq
Thanks for the deep-dive Shradha. Thank you Adrien for the additional questions and answers. I had a couple of questions, when looking around the cache code. 1. The `QueryCachingPolicy` [1] makes decisions based on `Query`. Why not use `Weight`? The `scorerSupplier` [2] in the `LRUQueryCache

Re: Fuzzy Query Similarity

2022-07-12 Thread Mike Drob
edit distance more generally into > > > the per-term score, although it does seem like that would be something > > > people would generally expect. > > > > Actually it does this: > > > > * By default FuzzyQuery uses a rewrite method that expands all

Re: Fuzzy Query Similarity

2022-07-11 Thread Mike Drob
d be something > > people would generally expect. > > Actually it does this: > > * By default FuzzyQuery uses a rewrite method that expands all terms > as should clauses into a boolean query: > MultiTermQuery.TopTermsBlendedFreqScoringRewrite(maxExpansions) >

Re: Lucene's LRU Query Cache - Deep Dive

2022-07-11 Thread Adrien Grand
meric field can leverage point index structures to only look at a small subset of the matching docs. Yet caching a query requires consuming all its matches, so it could significantly hurt latencies. It's important to not cache all queries to preserve the benefit of Lucene's filtering and

Re: Fuzzy Query Similarity

2022-07-09 Thread Michael Sokolov
ore, although it does seem like that would be something > > people would generally expect. > > Actually it does this: > > * By default FuzzyQuery uses a rewrite method that expands all terms > as should clauses into a boolean query: > MultiTermQuery.TopTermsBl

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
ault FuzzyQuery uses a rewrite method that expands all terms as should clauses into a boolean query: MultiTermQuery.TopTermsBlendedFreqScoringRewrite(maxExpansions) * TopTermsReqrite basically keeps track of a "boost" factor for each term and sorts the "best" terms in

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
The problem is that the query combines the native termquery score (which depends on length of document and term's statistic). The edit distance is also multiplied in. When the difference in term statistics is too large, the edit distance no longer matters. This is perfectly fine and

Re: Fuzzy Query Similarity

2022-07-09 Thread Michael Sokolov
enerally expect. So maybe FuzzyQuery should somehow do that? But without changing it, you could also use a query that does it explicitly; if you get a term "foo", you could maybe search for "foo OR foo~" ? On Fri, Jul 8, 2022 at 4:14 PM Mike Drob wrote: > > Hi folks, &g

Fuzzy Query Similarity

2022-07-08 Thread Mike Drob
Hi folks, I'm working with some fuzzy queries and trying my best to understand what is the expected behaviour of the searcher. I'm not sure if this is a similarity bug or an incorrect usage on my end. The problem is when I do a fuzzy search for a term "spark~" then instead of matching documents w

Lucene's LRU Query Cache - Deep Dive

2022-07-08 Thread Shradha Shankar
Hello! I work at Amazon Product Search and I’ve recently been looking into understanding how Lucene’s LRU Query Cache works. I’ve written up a summary of my understanding here. (Also attached as a markdown file with this email) Will really appreciate feedback/improvements/corrections for my

Re: Fwd: Finding out which fields matched the query

2022-06-27 Thread Uwe Schindler
Many of us already answered in the dev mailing list. Uwe Am 25.06.2022 um 05:19 schrieb Yichen Sun: -- 转发的邮件 - 发件人: Yichen Sun 日期:2022年6月25日 周六11:14 主题:Finding out which fields matched the query 收件人: , , < java-user@lucene.apache.org> Hello! I’m a MSCS student from

Re: Finding out which fields matched the query

2022-06-27 Thread Jörn Franke
cently I try to > output matched fields by one query. For example, for one document, there are > 10 fields and 2 of them match the query. I want to get the name of these > fields. > > I have tried using explain() method and getting description then regex. > However it cost so muc

Fwd: Finding out which fields matched the query

2022-06-27 Thread Yichen Sun
-- 转发的邮件 - 发件人: Yichen Sun 日期:2022年6月25日 周六11:14 主题:Finding out which fields matched the query 收件人: , , < java-user@lucene.apache.org> Hello! I’m a MSCS student from BU and learning to use Lucene. Recently I try to output matched fields by one query. For example, f

Finding out which fields matched the query

2022-06-27 Thread Yichen Sun
Hello! I’m a MSCS student from BU and learning to use Lucene. Recently I try to output matched fields by one query. For example, for one document, there are 10 fields and 2 of them match the query. I want to get the name of these fields. I have tried using explain() method and getting

Re: Multi-Value query test

2022-06-23 Thread Patrick Bernardina
Let me clarify: Example query: "(author:Patrick author:Michael) && type:pdf" Example result: 2 items: Doc1 with authors "Patrick, Adalberto" and Doc2 with authors "Patrick, Michael, Elias" I want to show the 2 items, but when I show the authors, I only wa

Multi-Value Query Test

2022-06-23 Thread Patrick Bernardina
How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all documents of some specific authors. The authors field contains multi-value sorted set. When showing the result, I want to show only the name of the authors spec

Re: Multi-Value query test

2022-06-23 Thread Michael Wechner
Maybe I misunderstand the problem, but why don't you decouple showing the results from the results of the query? Am 23.06.22 um 14:03 schrieb Patrick Bernardina: How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to

Multi-Value query test

2022-06-23 Thread Patrick Bernardina
How to test if a value in a multi-value field matches a specific query? Example of the problem: I've created a query to return all documents of some specific authors. The authors field contains multi-value sorted set. When showing the result, I want to show only the name of the authors spec

block join - index query

2021-10-07 Thread Pradeep Pathak
Hi I have one use case in relational tables as many to many relationship. Can block join document be indexed in similar fashion? like: parent can have many children & children could also have many parents if possible, could you please share document link how to index & query using block

Re: currency based search using query time calculated field match with expression

2021-09-05 Thread Kumaran Ramasubramanian
Thanks a lot for your inputs Michael. I will check about FunctionQuery. Thanks again :-) -- Kumaran R Chennai, India On Fri, Sep 3, 2021 at 9:22 PM Michael Sokolov wrote: > Sorry I'm not sure I understand what you're trying to do. Maybe you > want to match a document having a computed value?

Re: currency based search using query time calculated field match with expression

2021-09-03 Thread Michael Sokolov
Sorry I'm not sure I understand what you're trying to do. Maybe you want to match a document having a computed value? This is going to be potentially costly, potentially requiring post-filtering of all hits matching for other reasons. I think there is a FunctionQuery/FunctionRangeQuery that might h

Re: Automatic prefix search in query parser

2021-09-03 Thread Gauthier Roebroeck
Thank a lot Erik, I didn't thought about changing the index, only about the query. I will explore that route. On Fri, 3 Sep 2021, 22:53 Erik Hatcher, wrote: > A comparable alternative would be to use the edge ngram filter to index > prefixes instead. > > Erik > >

Re: Automatic prefix search in query parser

2021-09-03 Thread Erik Hatcher
. > I am using the > `org.apache.lucene.queryparser.classic.MultiFieldQueryParser` which works > very well so far. > > However I would like to automatically use the prefix notation (`*`) for all > terms in the query, instead of searching for exact terms, so the humans > entering the queries don

Automatic prefix search in query parser

2021-09-03 Thread Gauthier Roebroeck
Hello, I am using Apache Lucene 8.9.0 to parse queries that are entered by humans. I am using the `org.apache.lucene.queryparser.classic.MultiFieldQueryParser` which works very well so far. However I would like to automatically use the prefix notation (`*`) for all terms in the query, instead of

Re: currency based search using query time calculated field match with expression

2021-09-03 Thread Kumaran Ramasubramanian
Hi Michael, Thanks for the response. Based on my understanding, we can use the expressions module in lucene to reorder search results using custom score calculations based on expression using stored fields. But i am not sure how to do the same for lucene document hits(doc hits matching 2 USD with

Re: currency based search using query time calculated field match with expression

2021-09-02 Thread Michael Sokolov
Have you looked at the expressions module? It provides support for user-defined computation using values from the index based on a simple expression language. It might prove useful to you if the exchange rate needs to be tracked very dynamically. On Thu, Sep 2, 2021 at 2:15 PM Kumaran Ramasubraman

currency based search using query time calculated field match with expression

2021-09-02 Thread Kumaran Ramasubramanian
I am having one use case regarding currency based search. I want to get any suggestions or pointers.. For example, Assume, 1USD = 75 INR 1USD = 42190 IRR similarly, we have support for 100 currencies as of now. Record1 created with PRICE 150 INR & EXCHANGE_RATE 75 for USD Record2 created with PRI

Re: Range query with Lucene7.7.1 on old indexes.

2021-09-01 Thread Uwe Schindler
030") >doc.add(LongField('xdate', xdate, Field.Store.YES)) # stored and not >analyzed > >Query: > >query = NumericRangeQuery.newLongRange("xdate", long("2019010100"), >long("20190101115959"), True, True) > >I am getting the re

Range query with Lucene7.7.1 on old indexes.

2021-09-01 Thread Antony Joseph
Hi all, Using: python 2.7.14, pylucene 4.10.0 Index: xdate = long("20190101183030") doc.add(LongField('xdate', xdate, Field.Store.YES)) # stored and not analyzed Query: query = NumericRangeQuery.newLongRange("xdate", long("2019010100"), long(&quo

Query parser automatic prefix

2021-08-24 Thread Gauthier Roebroeck
Hello, I am using Apache Lucene 8.9.0 to parse queries that are entered by humans. I am using the `org.apache.lucene.queryparser.classic.MultiFieldQueryParser` which works very well so far. However I would like to automatically use the prefix notation (`*`) for all terms in the query, instead of

  1   2   3   4   5   6   7   8   9   10   >