Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Amitesh Kumar
Thank you! I will give it a try and share my findings with you all Regards Amitesh On Thu, Sep 21, 2023 at 08:18 Uwe Schindler wrote: > The problem with WhitespaceTokenizer is that is splits only on > whitespace. If you have text like "This is, was some test." then you get > tokens like "is," a

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Uwe Schindler
The problem with WhitespaceTokenizer is that is splits only on whitespace. If you have text like "This is, was some test." then you get tokens like "is," and "test." including the punctuations. This is the reason why StandardTokenizer is normally used for human readable text. WhitespaceTokeniz

Re: How to retain % sign next to number during tokenization

2023-09-21 Thread Mikhail Khludnev
Hello, I'm surprised and in doubt it may happen. Would you mind to upload a short test reproducing it? On Wed, Sep 20, 2023 at 11:44 PM Amitesh Kumar wrote: > Thanks Mikhail! > > I have tried all other tokenizers from Lucene4.4. In case of > WhitespaceTokwnizer, it loses romanizing of special ch

Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Amitesh Kumar
Thanks Mikhail! I have tried all other tokenizers from Lucene4.4. In case of WhitespaceTokwnizer, it loses romanizing of special chars like - etc On Wed, Sep 20, 2023 at 16:39 Mikhail Khludnev wrote: > Hello, > Check the whitespace tokenizer. > > On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar

Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Mikhail Khludnev
Hello, Check the whitespace tokenizer. On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar wrote: > Hi, > > I am facing a requirement change to get % sign retained in searches. e.g. > > Sample search docs: > 1. Number of boys 50 > 2. My score was 50% > 3. 40-50% for pass score > > Search query: 50% >

How to retain % sign next to number during tokenization

2023-09-20 Thread Amitesh Kumar
Hi, I am facing a requirement change to get % sign retained in searches. e.g. Sample search docs: 1. Number of boys 50 2. My score was 50% 3. 40-50% for pass score Search query: 50% Expected results: Doc-2, Doc-3 i.e. My score was 1. 50% 2. 40-50% for pass score Actual result: All 3 documents (

Re: How to retain % sign next to number during tokenization

2023-07-18 Thread Amitesh Kumar
Sorry for duplicating the question. On Tue, Jul 18, 2023 at 19:09 Amitesh Kumar wrote: > I am facing a requirement change to get % sign retained in searches. e.g. > > Sample search docs: > 1. Number of boys 50 > 2. My score was 50% > 3. 40-50% for pass score > > Search query: 50% > Expected resu

How to retain % sign next to number during tokenization

2023-07-18 Thread Amitesh Kumar
I am facing a requirement change to get % sign retained in searches. e.g. Sample search docs: 1. Number of boys 50 2. My score was 50% 3. 40-50% for pass score Search query: 50% Expected results: Doc-2, Doc-3 i.e. My score was 1. 50% 2. 40-50% for pass score Actual result: All 3 documents (becau