Re: Leaving certain tokens intact during indexing and search

Erick Erickson Wed, 30 Nov 2011 06:53:26 -0800

Well, it depends (tm). No, in your case WhitespaceTokenizer wouldn't work,
although it did satisfy your initial statement.


You could consider PatternTokenizerFactory, but take a look at the
link I provided, and follow it to the javadocs to see if there are
better matches.

Best
Erick

On Wed, Nov 30, 2011 at 9:41 AM, Marian Steinbach
<marian.steinb...@gmail.com> wrote:
> Thanks for the quick response!
>
> Are you saying that I should extend WhitespaceTokenizerFactory to create my
> own? Or should I simply use it?
>
> Because, I guess tokenizing on spaces wouldn't be enough. I would need
> tokenizing on slashes in other positions, just not within strings matching
> ([A-Z]+/[0-9]+/[0-9]+).
>
> Marian
>
>
> 2011/11/30 Erick Erickson <erickerick...@gmail.com>
>
>> There's about a zillion tokenizers, for what you're describing
>> WhitespaceTokenizerFactory is a good candidate.
>>
>> See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>> for a partial list, and it has links to the authoritative docs.
>>
>> Best
>> Erick
>>
>>

Re: Leaving certain tokens intact during indexing and search

Reply via email to