LowercaseFilter is part of Lucene, as are any number of other filters. Thebasic
idea is just that *after* tokenization, there may be further
transformations you want to do on each token, such as lower-casing
it, stemming it, skipping it,
But watch out a bit, there are token Filters and search
Thank you Erick.
As of now I'm using whitespaceanalyzer and no stemming and not stop word
remova. Now I feel writing a simple analyzer won't be that difficult after
going thru your mail. I'll give it a try. I don't have any idea on filters
but I'm pretty it must be simple and will definitely go thr
It's fairly easy to construct your own analyzer bystringing together some
filters and tokenizers. LIA (1st ed)
had a SynonymAnalyzer. You probably want something like
(WARNING, example only, I'm not even sure it compiles!! Ripped
off from the WIKI)
public class MyAnalyzer extends Analyzer
{
p
Thank you @Muir.
I was earlier using simpleanalyzer for all purposes but as you reccomended
me the whitespace one, I tried to use that analyzer and good thing is that
I'm able to index/search non-english text as well as supporting hit
highlighting for these non-english texts. Thank you very much.
B
as mentioned previously, i dont think your text is being analyzed the way
you want.
SimpleAnalyzer will break your word \u0BAA\u0BB0\u0BBF\u0BA3\u0BBE\u0BAE
(பரிணாம) into 3 tokens:
\u0BAA\u0BB0
\u0BA3
\u0BAE
Not only does it incorrectly split your word into three words, but it
completely drops t
Could you boil down this example to a smaller test case that fails?
Eg make a RAMDir, index one document (that should show hilighting),
search it, run highlight and show that it's not working?
Mike
On Mon, May 25, 2009 at 10:02 AM, KK wrote:
> Hi,
> I'm trying to index some non-english texts. I
Hi,
I'm trying to index some non-english texts. Indexing and searching is
working fine. From command line I'm able to provide the utf-8 unicoded text
as input like this,
\u0BAA\u0BB0\u0BBF\u0BA3\u0BBE\u0BAE
and able to get the search results.
Then I tried to add hit highlighting for the same. So I