Hi, For some crazy reason, some users somehow manage to substitute a perfectly normal space with a badly encoded non-breaking space, properly URL encoded this then becomes %c2a0 and depending on the encoding you use to view you probably see  followed by a space. For example:
Because c2a0 is not considered whitespace (indeed, it is not real whitespace, that is 00a0) by the Java Character class, the WhitespaceTokenizer won't split on it, but the WordDelimiterFilter still does, somehow mitigating the problem as it becomes: HTMLSCF een abonnement WT een abonnement WDF een eenabonnement abonnement Should the WhitespaceTokenizer not include this weird edge case? Cheers, Markus