Re: Synonyms and searching

2025-04-21 Thread Anh Dũng Bùi
[act as web server] generates: > > [work] > [act] > [like] > [as] > [internet] > [web] > [host] > [server] > > and the input: [act_as_web_server] generates: > > [work] > [act] > [act_] > [like] > [as] > [as_] > [internet] > [web] > [web_

RE: Synonyms and searching

2025-03-10 Thread Trevor Nicholls
phrases. But maybe my expectations are too high, or maybe I'm just doing it wrong. cheers T -Original Message----- From: Uwe Schindler Sent: Monday, 10 March 2025 23:38 To: java-user@lucene.apache.org Subject: Re: Synonyms and searching Hi, Another way to do this is using Word Delimite

Re: Synonyms and searching

2025-03-10 Thread Uwe Schindler
Hi, Another way to do this is using Word Delimiter Filter and use "catenate" options. Be aware that you need special text tokenization (not use standard tokenizer, but instead WhitespaceTokenizer). This approach is common for product numbers. To no break you "normal" analysis, it is often a

Re: Synonyms and searching

2025-03-05 Thread Michael Sokolov
One thing to check is whether the synonyms are configured as bidirectional, or which direction they go (eg is "a b" being expanded to "ab" but "ab" is not being expanded to "a b"??) On Wed, Mar 5, 2025 at 2:20 PM Mikhail Khludnev wrote: > > Hello Trevor. > > Maintaining such a synonym map is too

Re: Synonyms and searching

2025-03-05 Thread Mikhail Khludnev
Hello Trevor. Maintaining such a synonym map is too much of a burden. One idea: sticks words together with "" separator with https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html Another idea, the opposite breaks user's words via dictionary htt

Synonyms and searching

2025-03-05 Thread Trevor Nicholls
I don't know if I have completely the wrong idea or not, hopefully somebody can point out where I have got this wrong I am indexing technical documentation; the content contains strings like "http_proxy_server". When building the index my analyzer breaks this into the tokens "http", "proxy" and