FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer)

2025-07-08 Thread Singh, Divya
From: Singh, Divya Sent: 04 July 2025 14:40 To: d...@lucene.apache.org Cc: Birajdar, Sharad (DI SW PLM LCS APPS ALM R&D7) Subject: FW: Challenges with Chinese Query Matching and Wildcard Search in Lucene (StandardAnalyzer / CJKAnalyzer) From: Thakare, Monika (ext) (DI SW PLM LCS APPS A

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread Uwe Schindler
/apache/lucene/pull/12582>) using Lucene99ScalarQuantizedVectorsFormat < https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html that expects a confidence interval from 90-100. Here is a nice blog(s) that talks about how it works in Lucene. -

Re: Does Lucene Vector Search support int8 and / or even binary?

2025-04-14 Thread John Dale (DB2DOM)
t; Lucene99ScalarQuantizedVectorsFormat > < > https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html > > > that > expects a confidence interval from 90-100. Here is a nice blog(s) that > talks about how it works i

Re: fuzzy search and distance tilde

2024-08-20 Thread Uwe Schindler
lly that's how fulltext search works. Take the user entered text and tokenize/analyze it in the same way like you do on indexing and then find token matches in index for the query tokens. Uwe On 19/08/2024 12:32, Uwe Schindler wrote: Hi, Basically, my only recommendation is to NOT use the

Re: fuzzy search and distance tilde

2024-08-20 Thread Greg Huber
it is basically written for the use case "user enters some search terms without syntax knowledge". If you want to apply hardcoded filters never every construct plain string queries like that, they are always vulnerable to "SQL injection" issues. Pass the user-entered querie

Re: fuzzy search and distance tilde

2024-08-19 Thread Uwe Schindler
oes not allow users to pass field names, so it is basically written for the use case "user enters some search terms without syntax knowledge". If you want to apply hardcoded filters never every construct plain string queries like that, they are always vulnerable to "S

Re: fuzzy search and distance tilde

2024-08-14 Thread Greg Huber
OK thanks,  I do catch the exception and give a response. I do a stopword check, but the fuzzy search syntax seems way more complex as it does not like query statements SELECT AND etc. if (!EnglishAnalyzer.ENGLISH_STOP_WORDS_SET.contains(term)                        || terms.length == 1

Re: fuzzy search and distance tilde

2024-08-13 Thread Mikhail Khludnev
On Sun, Aug 11, 2024 at 11:38 AM Greg Huber wrote: > Is there a > way to escape these or configure lecene just to return no results rather > than an exception. > I don't think Lucene can handle it since the query parser and index searcher are separate components, which are wired by some code. I

fuzzy search and distance tilde

2024-08-11 Thread Greg Huber
Looking through my httpd logs I see lots of searches as such /devbox/search?q=%29%20AND%203318%3D4385%20AND%20%287778%3D7778 ie : ) AND 3318=4385 AND (7778=7778 guess they might be fishing for something. For the fuzzy search I use a different distance values and the default is  ~0.6 String

Re: Zipcode radius search outside certain miles of a zipcode

2024-05-08 Thread Dawid Weiss
You need to subtract the matching documents from everything else in the negative part, effectively: *:* AND NOT (zips-within area) D. On Wed, May 8, 2024 at 8:27 PM Siraj Haider wrote: > Hello there, > We are using Lucene v6.4.1 and are looking to implement geopoint searching > within or outsi

Zipcode radius search outside certain miles of a zipcode

2024-05-08 Thread Siraj Haider
Hello there, We are using Lucene v6.4.1 and are looking to implement geopoint searching within or outside certain zipcode. The within part works well, but the outside part does not, can somebody please check the code below and give some suggestions? Thanks in advance! query //this has some ot

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Puneeth Bikkumanla
Thanks Adrien! On Fri, Apr 12, 2024 at 9:49 AM Adrien Grand wrote: > You are correct, query rewriting is not affected by the use of search vs. > searchAfter. > > On Fri, Apr 12, 2024 at 3:37 PM Puneeth Bikkumanla > wrote: > > > Hello, > > Sorry I should

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Adrien Grand
You are correct, query rewriting is not affected by the use of search vs. searchAfter. On Fri, Apr 12, 2024 at 3:37 PM Puneeth Bikkumanla wrote: > Hello, > Sorry I should have clarified what I meant by “optimized”. I am familiar > with the collector/comparators using the “after” doc

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Puneeth Bikkumanla
Hello, Sorry I should have clarified what I meant by “optimized”. I am familiar with the collector/comparators using the “after” doc to filter out documents but I specifically was talking about the query rewriting phase. Is the query rewritten differently in search vs searchAfter? Looking at the

Re: Query Optimization in search/searchAfter

2024-04-12 Thread Adrien Grand
: > Hello, > I was wondering if a user-defined Query is optimized the same way in both > search/searchAfter provided the index stays the same (no CRUD takes place). > > In searchAfter we pass in an "after" doc so I was wondering if that changes > how a query is optimized at all.

Query Optimization in search/searchAfter

2024-04-11 Thread Puneeth Bikkumanla
Hello, I was wondering if a user-defined Query is optimized the same way in both search/searchAfter provided the index stays the same (no CRUD takes place). In searchAfter we pass in an "after" doc so I was wondering if that changes how a query is optimized at all. By looking at the

Community Over Code NA 2024 Search track, CFP closing soon

2024-04-08 Thread Anshum Gupta
Hi folks, The CFP for *“Community Over Code 2024” *(previously known as ApacheCon) is currently open until *15th Apr 2024* for folks who’re interested in submitting talks. Like the previous years we have the *'Search' track *for folks who want to talk about their Search stories. Ple

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-29 Thread Michael Wechner
_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html that expects a confidence interval from 90-100. Here is a nice blog(s) that talks about how it works in Lucene. - https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene - https

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-28 Thread Shubham Chaudhary
t; > that > > expects a confidence interval from 90-100. Here is a nice blog(s) that > > talks about how it works in Lucene. > > > > - > > > https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene > > - > https://www.elastic.co/sea

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-23 Thread Michael Wechner
ore/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html> that expects a confidence interval from 90-100. Here is a nice blog(s) that talks about how it works in Lucene. - https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene - https://www

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-22 Thread Michael Wechner
ucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html> that expects a confidence interval from 90-100. Here is a nice blog(s) that talks about how it works in Lucene. - https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene - https://www.elastic.co/search-labs/blog/articles/scalar-quant

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Michael Wechner
l/12582>) using Lucene99ScalarQuantizedVectorsFormat <https://lucene.apache.org/core/9_9_1/core/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html> that expects a confidence interval from 90-100. Here is a nice blog(s) that talks about how it works in Lucene. - https://www.ela

Re: Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Shubham Chaudhary
ucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsFormat.html> that expects a confidence interval from 90-100. Here is a nice blog(s) that talks about how it works in Lucene. - https://www.elastic.co/search-labs/blog/articles/scalar-quantization-in-lucene - https://www.elastic.co/search-labs/blog/articles/scalar-quantization-10

Does Lucene Vector Search support int8 and / or even binary?

2024-03-19 Thread Michael Wechner
Hi Cohere recently announced there "compressed" embeddings https://twitter.com/Nils_Reimers/status/1769809006762037368 https://www.linkedin.com/posts/bhavsarpratik_rag-genai-search-activity-7175850704928989187-Ki1N/?utm_source=share&utm_medium=member_desktop Does Lucene Vector

Re: hnsw parameters for vector search

2024-02-01 Thread Michael Sokolov
Reimers > explains quite nicely why this might happen, see for example > > https://www.youtube.com/watch?v=Abh3YCahyqU > > HTH > > Michael > > > > Am 30.01.24 um 15:48 schrieb Moll, Dr. Andreas: > > Hi, > > > > the hnsw documentation for the Lucene H

Re: hnsw parameters for vector search

2024-01-30 Thread Michael Wechner
the Lucene HnswGraph and the SolR vector search is not very verbose, especially in regards to the parameters hnswMaxConn and hnswBeamWidth. I find it hard to come up with sensible values for these parameters by reading the paper from 2018. Does anyone have experience with the influence of the paramet

hnsw parameters for vector search

2024-01-30 Thread Moll, Dr. Andreas
Hi, the hnsw documentation for the Lucene HnswGraph and the SolR vector search is not very verbose, especially in regards to the parameters hnswMaxConn and hnswBeamWidth. I find it hard to come up with sensible values for these parameters by reading the paper from 2018. Does anyone have

Katie released as Open Source under the Apache License 2.0 using Lucene for full text and vector search by default

2024-01-24 Thread Michael Wechner
Hi Together Yesterday, Katie got released as Open Source under the Apache License 2.0 using Lucene for full text and vector search by default. You can find the code on GitHub https://github.com/wyona/katie-backend A very big thank you to everyone working on Lucene, to make this great search

Re: Azure AI Search uses Apache Lucene for full text search

2024-01-24 Thread Michael Wechner
time on it during the next couple of days and keep you posted once I will have gained more experience. Thanks Michael Am 22.01.24 um 09:06 schrieb Ali Akhtar: Sure, please share On Mon, Jan 22, 2024 at 1:33 AM Michael Wechner wrote: Hi I recently noticed, that Azure AI Search uses

Re: Azure AI Search uses Apache Lucene for full text search

2024-01-22 Thread Ali Akhtar
Sure, please share On Mon, Jan 22, 2024 at 1:33 AM Michael Wechner wrote: > Hi > > I recently noticed, that Azure AI Search uses Apache Lucene > <https://lucene.apache.org/> for full text search > > > https://learn.microsoft.com/en-us/azure/search/search-lucene-que

Azure AI Search uses Apache Lucene for full text search

2024-01-21 Thread Michael Wechner
Hi I recently noticed, that Azure AI Search uses Apache Lucene <https://lucene.apache.org/> for full text search https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture which I did not know so far, but I think it is very cool, that Microsoft is using Lucene.

Difference in search result for Luke and my code.

2024-01-10 Thread Saha, Rajib
Hi Experts, I am in a project of migration of Lucene from 2.4.1 to 8.11.2 for our product. I am seeing some difference in search result of my written code Vs. Luke Tool result on same Index files. Can anybody please explain, what can be the reason behind it? I explaining more in details below

[REMINDER] CFP Open for Search Track at Community Over Code EU (Formerly ApacheCon)

2024-01-05 Thread Anshum Gupta
Hello everyone, The CFP for *Community Over Code Europe* (Formerly ApacheCon) is open until 12 Jan 2024 and wanted to remind you that we have a *Search track*. Please submit your talks and share your stories here: https://sessionize.com/coceu-2024/ A bit about the Search track: Search is at the

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-09-01 Thread Mikhail Khludnev
tor/semantic and text) is valuable, even with > high quality embeddings, and helps when the searcher's intent is to search > for specific words or phrases (such as a name, or exact concepts) which get > blurred-out by semantics. I discussed blended searching using Lucene in

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Kent Fitch
My testing shows Lucene's HNSW in a very positive light. The ability to perform blended searches (vector/semantic and text) is valuable, even with high quality embeddings, and helps when the searcher's intent is to search for specific words or phrases (such as a name, or exact concepts)

Re: Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael McCandless
Thanks Michael, very interesting! I of course agree that Lucene is all you need, heh ;) Jimmy Lin also tweeted about the strength of Lucene's HNSW: https://twitter.com/lintool/status/1681333664431460353?s=20 Mike McCandless http://blog.mikemccandless.com On Thu, Aug 31, 2023 at 3:31 AM Michae

Vector Search with OpenAI Embeddings: Lucene Is All You Need

2023-08-31 Thread Michael Wechner
Hi Together You might be interesed in this paper / article https://arxiv.org/abs/2308.14963 Thanks Michael - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@luce

AW: retrieving search matches with their frequency and positions

2023-07-17 Thread nedyalko.zhe...@freelance.de.INVALID
tHi Mikhail, I've finally implemented in this way. Sorry for the delayed answer. TopDocs topDocs = this.searcher.search(query, maxResults); Weight weight = query.rewrite(this.searcher.getIndexReader()).createWeight(this.searcher, ScoreMode.TOP_DOCS, 1.0f); for (ScoreDoc scoreDoc : topDocs.sc

Fwd: How to retain % sign against numbers in lucene indexing/ search

2023-07-13 Thread Amitesh Kumar
*Warm Regards,* *Amitesh K* -- Forwarded message - From: Amitesh Kumar Date: Wed, Jul 12, 2023 at 7:03 AM Subject: How to retain % sign against numbers in lucene indexing/ search To: Hi Group, I am facing a requirement change to get % sign retained in searches. e.g Sample

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
OK https://lucene.apache.org/core/8_11_2/core/org/apache/lucene/search/Weight.html#matches-org.apache.lucene.index.LeafReaderContext-int- On Mon, Jul 10, 2023 at 2:08 PM nedyalko.zhe...@freelance.de.INVALID wrote: > Hi Mikhail, > > I don't see the matches `searcher.matches(topDo

AW: retrieving search matches with their frequency and positions

2023-07-10 Thread nedyalko.zhe...@freelance.de.INVALID
ing to? Thanks. Ned Von: Mikhail Khludnev Gesendet: Montag, 10. Juli 2023 11:53 An: java-user@lucene.apache.org Betreff: Re: retrieving search matches with their frequency and positions Hi Ned. It's about TopDocs topDocs = searcher.search(query, 10); for (int i = 0; i <

Re: retrieving search matches with their frequency and positions

2023-07-10 Thread Mikhail Khludnev
. This is (almost) how highlighters (like https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html) work. In some sort you can get https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.sear

AW: retrieving search matches with their frequency and positions

2023-07-10 Thread nedyalko.zhe...@freelance.de.INVALID
Hello Mikhail, Great, thanks for the very fast response! The link that you provided is very useful and informative. Though, I have an understanding issue. After I have searched for a search term, I get always TopDocs that represent the found documents. In my understanding there is no relation

Re: retrieving search matches with their frequency and positions

2023-07-09 Thread Mikhail Khludnev
gt; Good Morning everyone! > > I'm new to Lucene and I use currently version 8.11.2. > I'm doing a simple boolean query. After I've executed the search() method > and got results, I'd like to get infotmation about how often a term from > the query has been matched. In

retrieving search matches with their frequency and positions

2023-07-09 Thread nedyalko.zhe...@freelance.de.INVALID
Good Morning everyone! I'm new to Lucene and I use currently version 8.11.2. I'm doing a simple boolean query. After I've executed the search() method and got results, I'd like to get infotmation about how often a term from the query has been matched. In other words, I'

[REMINDER] CFP Open for Search Track at Community Over Code (Formerly ApacheCon)

2023-06-12 Thread Anshum Gupta
Hello everyone, The CFP for *Community Over Code *(Formerly ApacheCon) is open until Thu, *13 July 2023 *23:59:59 GMT and wanted to remind you that we have a *Search track.* Please submit your talks and share your stories here: https://communityovercode.org/call-for-presentations/ A bit about

Re: Question about index segment search order

2023-05-13 Thread Uwe Schindler
single query is not multithreaded. Solr works on shards and paralellizes them, but it does not parallelize search on a single index * If you want to have control on the order of segments when searching, theres an easy way with pure lucene, Solr would need to be patched: o don't

Re: Question about index segment search order

2023-05-11 Thread Wei
he/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java > > , > > which triggers EarlyTerminatingCollectorException in SolrIndexSearcher > > > https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f

Re: Question about index segment search order

2023-05-11 Thread Michael Sokolov
atingCollector > https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java > , > which triggers EarlyTerminatingCollectorException in SolrIndexSearcher > https://github.com/apache/solr/blob/d9ddba

Re: Question about index segment search order

2023-05-09 Thread Wei
Hi Michael, I am applying early termination with Solr's EarlyTerminatingCollector https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java , which triggers EarlyTerminatingCollectorExcepti

Re: Question about index segment search order

2023-05-04 Thread Michael Sokolov
ided, > > are > > > > the > > > > > segments traversed in the order of creation time, i.e. the oldest > > > segment > > > > > is always visited first? > > > > > > > > > > Wei > > > > > > > >

Re: Question about index segment search order

2023-05-04 Thread Patrick Zhai
> > > > > Wei > > > > > > > > On Tue, May 2, 2023 at 7:22 PM Patrick Zhai > > wrote: > > > > > > > > > Hi Wei, > > > > > Lucene in general iterate through the index in the order of what is > > > > > recorded in the

Re: Question about index segment search order

2023-05-04 Thread Wei
wrote: > > > > > > > Hi Wei, > > > > Lucene in general iterate through the index in the order of what is > > > > recorded in the SegmentInfos > > > > < > > > > > > > > > > https://github.com/apache/lucene/blob/main/lucene/core/sr

Re: Question about index segment search order

2023-05-04 Thread Michael Sokolov
t; Wei > > > > On Tue, May 2, 2023 at 7:22 PM Patrick Zhai wrote: > > > > > Hi Wei, > > > Lucene in general iterate through the index in the order of what is > > > recorded in the SegmentInfos > > > < > > > > > > https://gi

Re: Question about index segment search order

2023-05-03 Thread Patrick Zhai
che/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140 > > > > > And at search time, you can specify the order using LeafSorter > > < > > > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.

Re: Question about index segment search order

2023-05-03 Thread Wei
in the order of what is > recorded in the SegmentInfos > < > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140 > > > And at search time, you can specify the order using LeafSorter > < > https://github.com/apa

Re: Question about index segment search order

2023-05-02 Thread Patrick Zhai
Hi Wei, Lucene in general iterate through the index in the order of what is recorded in the SegmentInfos <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140> And at search time, you can specify the order using LeafSorter

Question about index segment search order

2023-05-02 Thread Wei
Hello, We have a index that has multiple segments generated with continuous updates. Does Lucene have a specific order when iterate through the segments (assuming single query thread) ? Can the order be customized that the latest generated segments are searched first? Thanks, Wei

Re: Vector Search on Lucene

2023-03-16 Thread Michael McCandless
Note that Lucene's demo package (IndexFiles.java, SearchFiles.java) also show examples of how to index and search KNN vectors. Mike McCandless http://blog.mikemccandless.com On Thu, Mar 2, 2023 at 4:46 AM Michael Wechner wrote: > Hi Marcos > > The indexing looks kind of > &g

Re: Vector Search on Lucene

2023-03-02 Thread Michael Wechner
b marcos rebelo: Hi all, I'm willing to use Vector Search with Lucene. I have vectors created for queries and documents outside Lucene. I would like to upload the document vectors to a Lucene index, Then use Lucene to filter the documents (like classical search) and rank the remaining produc

Vector Search on Lucene

2023-03-02 Thread marcos rebelo
Hi all, I'm willing to use Vector Search with Lucene. I have vectors created for queries and documents outside Lucene. I would like to upload the document vectors to a Lucene index, Then use Lucene to filter the documents (like classical search) and rank the remaining products with the Ve

Re: Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-11 Thread Michael Wechner
erankField (sorry for the bad name, maybe FastVectorField would be amusing too), that just stores vectors as docvalues (no HNSW) and has a newRescorer() method that implements org.apache.lucene.search.Rescorer. Then its easy to do as that document describes, pull top 500 hits with BM25 and rerank th

Re: Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-10 Thread Robert Muir
document describes, pull top 500 hits with BM25 and rerank them with your vectors, very fast, only 500 calculations required, no HNSW or anything needed. Of course you could use a vector search instead of a BM25 search as the initial search to pull the top 500 hits too. So it could meet both use

Re-ranking using cross-encoder after vector search (bi-encoder)

2023-02-10 Thread Michael Wechner
Hi I use the vector search of Lucene, whereas the embeddings I get from SentenceBERT for example. According to https://www.sbert.net/examples/applications/retrieve_rerank/README.html a re-ranking with a cross-encoder after the vector search (bi-encoding) can improve the ranking. Would it

Re: Prioritising certain documents in the search results

2023-02-01 Thread Robert Muir
023 at 12:03 PM Trevor Nicholls wrote: > > Hi > > > > I'm currently using Lucene 8-6.3, and indexing a few thousand documents. > Some of these documents need to be prioritised in the search results, but > not by too much; e.g. an exact phrase match in a normal document

Prioritising certain documents in the search results

2023-02-01 Thread Trevor Nicholls
Hi I'm currently using Lucene 8-6.3, and indexing a few thousand documents. Some of these documents need to be prioritised in the search results, but not by too much; e.g. an exact phrase match in a normal document still needs to top the rankings ahead of a priority document that just ma

Re: Best practice - preparing search term for Lucene

2022-09-24 Thread Hrvoje Lončar
words. Replacing letters or group of letters by another > approaching one. > > In french e é è ê ai ei sound a bit the same, and for someone who write > mistakes having to use the right letters is very frustrating. So I > transformed all of them into e... > > Hope it helps > >

Re: Best practice - preparing search term for Lucene

2022-09-24 Thread Hrvoje Lončar
hen specific product is to be checked and few other things like "Tag_329" which gives me fast search by specific tag through the products. On Fri, 23 Sept 2022, 19:26 Stephane Passignat, wrote: > Hi > > I would don't store the original value. That's "just&q

Re: Best practice - preparing search term for Lucene

2022-09-23 Thread Hrvoje Lončar
Good point! For now I'll leave it normalized. Every search term coming from frontend is stored and also its counter updated which will help me after some time to see trends and to decide to change the logic or not. P.S. Here is the funny part: in Croatian "pišanje" means peeing

Re: Best practice - preparing search term for Lucene

2022-09-23 Thread Stephane Passignat
e Lončar" < horv...@gmail.com<mailto:horv...@gmail.com>mailto:horv...@gmail.com>>> a écrit: Hi! I'm using Hibernate Search / Lucene to index my entities in Spring Boot aplication. One thing I'm not sure is how to handle Croatian specific letters. Croatian language h

Re: Best practice - preparing search term for Lucene

2022-09-23 Thread Michael Sokolov
I think it depends how precise you want to make the search. If you want to enable diacritic-sensitive search in order to avoid confusions when users actually are able to enter the diacritics, you can index both ways (ascii-folded and not folded) and not normalize the query terms. Or you can just

Re: Best practice - preparing search term for Lucene

2022-09-23 Thread Hrvoje Lončar
r Android<https://bluemail.me> > Le 22 sept. 2022, à 16:37, "Hrvoje Lončar" horv...@gmail.com>> a écrit: > > Hi! > > I'm using Hibernate Search / Lucene to index my entities in Spring Boot > aplication. > > One thing I'm not sure is how to han

Re: Best practice - preparing search term for Lucene

2022-09-22 Thread Stephane Passignat
right letters is very frustrating. So I transformed all of them into e... Hope it helps Télécharger BlueMail pour Android<https://bluemail.me> Le 22 sept. 2022, à 16:37, "Hrvoje Lončar" mailto:horv...@gmail.com>> a écrit: Hi! I'm using Hibernate Search / Lucene to

Best practice - preparing search term for Lucene

2022-09-22 Thread Hrvoje Lončar
Hi! I'm using Hibernate Search / Lucene to index my entities in Spring Boot aplication. One thing I'm not sure is how to handle Croatian specific letters. Croatian language has few additional letters "*č* *Č* *ć* *Ć* *đ* *Đ* *š* *Š* *ž* *Ž*". Letters "*đ* *Đ*" a

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-23 Thread Michael Wechner
-- thanks for sharing! Julie On Sun, May 22, 2022 at 3:29 PM Matt Davis wrote: Thanks Julie. I was able to implement vector search in Zulia with your pointers. The pull request might be helpful to others: https://github.com/zuliaio/zuliasearch/pull/70 Thanks, Matt On Fri, May 20, 2022 at 9:23

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-23 Thread Julie Tibshirani
was able to implement vector search in Zulia with your > pointers. The pull request might be helpful to others: > https://github.com/zuliaio/zuliasearch/pull/70 > > Thanks, > Matt > > On Fri, May 20, 2022 at 9:23 AM Michael Wechner > > wrote: > > > Hi Julie > &

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-22 Thread Matt Davis
Thanks Julie. I was able to implement vector search in Zulia with your pointers. The pull request might be helpful to others: https://github.com/zuliaio/zuliasearch/pull/70 Thanks, Matt On Fri, May 20, 2022 at 9:23 AM Michael Wechner wrote: > Hi Julie > > I got it running and it

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-20 Thread Michael Wechner
ot; Query preFilterQuery =new TermQuery(new Term(TOPIC_FIELD,"general")); if (filter !=null) { log.info("Filter applied before the vector search: " + preFilterQuery); } Query query =new KnnVectorQuery(VECTOR_FIELD, queryVector, k, preFilterQuery); TopDocs topDocs = searcher.sea

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-10 Thread Michael Wechner
would be nice to add! For now maybe looking at the unit tests could give a sense of how to use it. Here's an example: https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/search/TestKnnVectorQuery.java#L115-L127. The idea is that KnnVectorQuery optionally accepts a

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-10 Thread Julie Tibshirani
che/lucene/search/TestKnnVectorQuery.java#L115-L127. The idea is that KnnVectorQuery optionally accepts a Query as a filter, and returns the k nearest vectors that also match the filter. Many people refer to this as "kNN with prefiltering" (as opposed to "postfiltering", where the fil

Re: Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
-summary.html which I was not aware of, but disabled the tracking now and hope it will be ok now. Thanks Michael Am 09.05.22 um 15:12 schrieb Michael Wechner: Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great :-) I have found http://url7093.wyona.com/ls/click?upn=JOH5Fjdv9AA9sbvUyiP84WWONyl36e4Tdd3VZFG-2B7pcYPJTPhVT3xqtcUDjPgQX5jI0WYWlJZX8h9NDC6okDRg-3D-3DHvvY_UMWFA

Example / Demo re support for filtering in nearest-neighbor vector search (Lucene 9.1.0)

2022-05-09 Thread Michael Wechner
Hi I noticed that Lucene 9.1.0 supports filtering in nearest-neighbor vector search, which is great I have found https://issues.apache.org/jira/browse/SOLR-15947 https://issues.apache.org/jira/browse/LUCENE-10382 and https://lucene.apache.org/core/9_1_0/demo/org/apache/lucene/demo/knn

Using Lucene to search partial source code

2021-12-23 Thread Yuxin Liu
Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code indexes. I have a set of source code snippets and I want to use part of the source code snippet as a query and obtain the document with its source

Re: Question about using Lucene to search source code

2021-12-20 Thread Michael Wechner
Hi Yuxin Can you provide a concrete example of a query and a document/code snippet? Thanks Michael Am 20.12.21 um 03:06 schrieb Yuxin Liu: Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code in

Question about using Lucene to search source code

2021-12-20 Thread Yuxin Liu
Dear development community of Lucene: Hi from student research assistant Yuxin Liu. I'm using Lucene to build an index search for source code indexes using TF-IDF similarity. I have a set of source code snippets and I want to use part of the source code snippet as a query and obtain the doc

Re: How to change sorting *after* getting search results

2021-11-30 Thread Luís Filipe Nassif
tting the results. We collect all >> of them, this is important for our use case, disabling scoring if the >> result size is too large to make the search faster. Currently we have our >> own multi-thread sorting code using DocValues (one instance per thread) to >> do this aft

Re: How to change sorting *after* getting search results

2021-11-30 Thread Michael Sokolov
ery heavy searches and they are able to change the > sorting criteria multiple times after getting the results. We collect all > of them, this is important for our use case, disabling scoring if the > result size is too large to make the search faster. Currently we have our > own multi-thre

How to change sorting *after* getting search results

2021-11-30 Thread Luís Filipe Nassif
Hi Lucene community, Our users could do very heavy searches and they are able to change the sorting criteria multiple times after getting the results. We collect all of them, this is important for our use case, disabling scoring if the result size is too large to make the search faster. Currently

Re: Search while typing (incremental search)

2021-10-27 Thread Michael Wechner
I have added a QnA https://cwiki.apache.org/confluence/display/LUCENE/LuceneFAQ#LuceneFAQ-DoesLucenesupportauto-suggest/autocomplete? I will also try to provide an example, for example https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36 https

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Wechner
? - "Does Lucene support incremental search?" - "Does Lucene support auto completion suggestions?" Or would other other terms / or another wording make more sense? Thanks Michael Am 07.10.21 um 01:14 schrieb Robert Muir: TLDR: use the lucene suggest/ package. Start wit

Re: Search while typing (incremental search)

2021-10-08 Thread Michael Sokolov
r your feedback! > > I will try it :-) > > As I wrote I would like to add a summary to the Lucene FAQ > (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) > > Would the following questions make sense? > > - "Does Lucene support incremental search?&

Re: Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
Thanks very much for your feedback! I will try it :-) As I wrote I would like to add a summary to the Lucene FAQ (https://cwiki.apache.org/confluence/display/lucene/lucenefaq) Would the following questions make sense? - "Does Lucene support incremental search?" - &q

Re: Search while typing (incremental search)

2021-10-06 Thread Robert Muir
integration :) Run that suggester on the user input, retrieving say, the top 5-10 matches of relevant query suggestions. return those in the UI (typical autosuggest-type field), but also run a search on the first one. The user gets the instant-search experience, but when they type 'tes', you

Search while typing (incremental search)

2021-10-06 Thread Michael Wechner
Hi I am trying to implement a search with Lucene similar to what for example various "Note Apps" (e.g. "Google Keep" or "Samsung Notes") are offering, that with every new letter typed a new search is being executed. For example when I type "tes"

Search Track at ApacheCon 2021, Sep 21-23

2021-09-15 Thread Anshum Gupta
Hi Everyone, ApacheCon 2021 is scheduled to begin next week. Like last year, the event is *100% virtual and free* to register. Also, just like last year, we have a dedicated Search track that has a lot of interesting talks and panel discussion about Apache Lucene and Solr. With 2 full days of

Re: currency based search using query time calculated field match with expression

2021-09-05 Thread Kumaran Ramasubramanian
gt; > > Based on my understanding, we can use the expressions module in lucene to > > reorder search results using custom score calculations based on > expression > > using stored fields. > > > > But i am not sure how to do the same for lucene document hits(doc hits > >

Re: currency based search using query time calculated field match with expression

2021-09-03 Thread Michael Sokolov
> > Hi Michael, Thanks for the response. > > Based on my understanding, we can use the expressions module in lucene to > reorder search results using custom score calculations based on expression > using stored fields. > > But i am not sure how to do the same for lucene docume

Re: Automatic prefix search in query parser

2021-09-03 Thread Gauthier Roebroeck
Thank a lot Erik, I didn't thought about changing the index, only about the query. I will explore that route. On Fri, 3 Sep 2021, 22:53 Erik Hatcher, wrote: > A comparable alternative would be to use the edge ngram filter to index > prefixes instead. > > Erik > > > > On Sep 3, 2021, at 1

Re: Automatic prefix search in query parser

2021-09-03 Thread Erik Hatcher
A comparable alternative would be to use the edge ngram filter to index prefixes instead. Erik > On Sep 3, 2021, at 10:49 AM, Gauthier Roebroeck > wrote: > > Hello, > > I am using Apache Lucene 8.9.0 to parse queries that are entered by humans. > I am using the > `org.apache.lucen

  1   2   3   4   5   6   7   8   9   10   >