Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
On Mon, Mar 15, 2010 at 7:25 PM, Chris Hostetter wrote: > Hmmm... I'm not sure i understand how any declared CharFilter/TOkenizer > combo will be able to deal with this any better, but i'll take your word > for it. you can see this behavior in SolrAnalyzer's reusableTokenStream method, it re-use

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Chris Hostetter
: They would not be able to re-use if you did this, because when you : call reset(Reader) on them, the Reader would not be wrapped. Hmmm... I'm not sure i understand how any declared CharFilter/TOkenizer combo will be able to deal with this any better, but i'll take your word for it. Kill it t

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
On Mon, Mar 15, 2010 at 7:18 PM, Chris Hostetter wrote: > > In the case of these factories: can't we eliminate the Html*Tokenizers > themselves, but make the *factories* return the neccessary *Tokenizer > wrapped in an HtmlStripCharFilter ? They would not be able to re-use if you did this, becaus

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Chris Hostetter
: Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? I'm not adverse to gutting *internal* deprecated classes on just about any release (requiring plugin writers to deal with the deprecation) but if it's possible to keep things working for users with no java knowl

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
On Mon, Mar 15, 2010 at 5:30 PM, Shalin Shekhar Mangar wrote: > Is there a way we can fix LUCENE-2098 too? > I think this is good to fix, yet removing the deprecations is unrelated to this slowdown. The deprecated functionality (HtmlStrip*Tokenizer) is implemented in terms of the slower CharFil

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Shalin Shekhar Mangar
On Tue, Mar 16, 2010 at 2:09 AM, Robert Muir wrote: > Hello, > > Is there any concern with removing the deprecated HtmlStrip*Tokenizer > factories? > > These can be done with CharFilter instead and they have some problems > with lucene's trunk. > > If no one objects, I'd like to remove these in t

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Mark Miller
On 03/15/2010 05:24 PM, Paul Borgermans wrote: On Mon, Mar 15, 2010 at 9:39 PM, Robert Muir wrote: Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? Maybe a communication issue, you need to read the source code or javadocs to know it is depreca

Re: removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Paul Borgermans
On Mon, Mar 15, 2010 at 9:39 PM, Robert Muir wrote: > Hello, > > Is there any concern with removing the deprecated HtmlStrip*Tokenizer > factories? > Maybe a communication issue, you need to read the source code or javadocs to know it is deprecated > These can be done with CharFilter instead an

removal of deprecated HtmlStrip*Tokenizer factories

2010-03-15 Thread Robert Muir
Hello, Is there any concern with removing the deprecated HtmlStrip*Tokenizer factories? These can be done with CharFilter instead and they have some problems with lucene's trunk. If no one objects, I'd like to remove these in the branch. Otherwise, Uwe tells me there is some way to make them wor