Hi Wouter, See point 8 in the Backwards Compatibility CHANGES.txt. The reason is explained in several issues (not all listed there), problems are e.g. in the Unicode 4 changes, where a non-final WhitespaceTokenizer would need to do reflection-based backwards hacks (like in 2.9 when we changed to incrementToken).
The Tokenization API in Lucene is decorator based, so to extend/modify it, you can: a) Use one of the abstract base classes b) simply wrap a TokenFilter on top that modifies what you intend to do For your case it seems to be the most simple thing to extend not WhitespaceTokenizer but instead its base class CharTokenizer (which is abstract). There you simply override isTokenChar(int) and/or normalize(int). These methods look very simple for WhitespaceTokenizer, so simply add your rules there. Extending WhiteSpaceTokenizer is not needed and leads to problems, as mentioned above. Lucene 3.1's test suite checks now that *every* TokenStream component is final (when assertions are enabled in your code). Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Wouter Heijke [mailto:whei...@xs4all.nl] > Sent: Friday, April 01, 2011 9:40 AM > To: java-user@lucene.apache.org > Subject: 3.1 upgrade problem > > > I'm doing the upgrade to Lucene 3.1.0. > The upgrade failed on WhitespaceTokenizer being final in this version. > I don't understand why anyone would make this tokenizer final, I was > happlily extending it for many Lucene versions! > > Wouter > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org