Lucene Analyzer that can handle C++ vs C#

2009-12-11 Thread maxSchlein
Can someone please point me in the right direction. We are creating an application that needs to beable to search on C++ and get back doc's that have C++ in it. The StandardAnalyzer does not seem to index the "+", so a search for "C++" will bring back docs that contain, C++, C, C#, etc. The

Re: Lucene Analyzer that can handle C++ vs C#

2009-12-24 Thread maxSchlein
(nextToken != null) { nextToken.setTermText(nextToken.termText().replaceAll(":|,|\\(|\\)|“|~|;|&|\\.","")); } return nextToken; } } maxSchlein wrote: > > Can someone please point me in the right direction. > > We are creating

help customfilter with incrementToken() and AttributeSource APIs

2009-12-24 Thread maxSchlein
In the current version of lucene, 3.0 the following methods are no longer available. - TokenStream.next() - TokenStream.next(Token). - Token.setTermText() - Token.termText(). The newer versions says to use, incrementToken() and AttributeSource APIs. But I cannot find much hel

Re: Lucene Analyzer that can handle C++ vs C#

2009-12-24 Thread maxSchlein
That is awesome, just one thing, and forgive me if i sound ignorant. What is "FastZemberek zemberek"? Ahmet Arslan wrote: > > >> public class CustomFilter extends TokenFilter >> { >>     protected CustomFilter(TokenStream >> tokenStream) >>     { >>         super(tokenStream); >>     } >>    

Text extraction from ms word doc

2010-01-11 Thread maxSchlein
I was looking for an option for Text extraction from a word doc. Currently I am using POI; however, when there is a table in the doc, for each column POI brings back a . The whitespace analyzer is not filtering out this character. So whatever word or phrase that is the last word or phrase wi

Controlling what is indexed / normalizing our index

2010-02-15 Thread maxSchlein
We have a list of keywords with aliases (Example: keyword = "ms access" aliases = "microsoft access", "msaccess", "m.s. access" ) We would like to intercept the aliases prior to them being indexed, and have the keyword indexed instead. We can do this with a CustomFilter for single word aliases