Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Amitesh Kumar
Thanks Mikhail! I have tried all other tokenizers from Lucene4.4. In case of WhitespaceTokwnizer, it loses romanizing of special chars like - etc On Wed, Sep 20, 2023 at 16:39 Mikhail Khludnev wrote: > Hello, > Check the whitespace tokenizer. > > On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar

Re: How to retain % sign next to number during tokenization

2023-09-20 Thread Mikhail Khludnev
Hello, Check the whitespace tokenizer. On Wed, Sep 20, 2023 at 7:46 PM Amitesh Kumar wrote: > Hi, > > I am facing a requirement change to get % sign retained in searches. e.g. > > Sample search docs: > 1. Number of boys 50 > 2. My score was 50% > 3. 40-50% for pass score > > Search query: 50% >

How to retain % sign next to number during tokenization

2023-09-20 Thread Amitesh Kumar
Hi, I am facing a requirement change to get % sign retained in searches. e.g. Sample search docs: 1. Number of boys 50 2. My score was 50% 3. 40-50% for pass score Search query: 50% Expected results: Doc-2, Doc-3 i.e. My score was 1. 50% 2. 40-50% for pass score Actual result: All 3 documents (

best practices for generating queries from users questions?

2023-09-20 Thread qrdl kaggle
Given a knowledge base indexed by lucene, users often pose searches via questions. Is there a good reference code/paper/doc on how to translate those natural language questions into an effective and accurate lucene query?