Got me right when Solr reported the error on restart :) Thanks!
2011/11/30 Steven A Rowe
> Note that my example does not actually use PatternReplaceCharFilterFactory
> twice - the second one is actually a PatternReplaceFilterFactory - note
> that "Char" isn't present in the second one.
>
> CharF
enizers.
Steve
> -Original Message-
> From: Marian Steinbach [mailto:marian.steinb...@gmail.com]
> Sent: Wednesday, November 30, 2011 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Leaving certain tokens intact during indexing and search
>
> That's pretty h
That's pretty helpful, thanks! Especially since I didn't understand so far
that I could use a filter like PatternReplaceCharFilterFactory both as a
charFilter and as a filter.
In the meantime I had figured out another alternative,
involving WordDelimiterFilterFactory. But I had to
use WhitespaceTo
arian.steinb...@gmail.com]
> Sent: Wednesday, November 30, 2011 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Leaving certain tokens intact during indexing and search
>
> Thanks for the quick response!
>
> Are you saying that I should extend WhitespaceTokenizerFacto
Well, it depends (tm). No, in your case WhitespaceTokenizer wouldn't work,
although it did satisfy your initial statement.
You could consider PatternTokenizerFactory, but take a look at the
link I provided, and follow it to the javadocs to see if there are
better matches.
Best
Erick
On Wed, Nov
Thanks for the quick response!
Are you saying that I should extend WhitespaceTokenizerFactory to create my
own? Or should I simply use it?
Because, I guess tokenizing on spaces wouldn't be enough. I would need
tokenizing on slashes in other positions, just not within strings matching
([A-Z]+/[0-9
There's about a zillion tokenizers, for what you're describing
WhitespaceTokenizerFactory is a good candidate.
See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for a partial list, and it has links to the authoritative docs.
Best
Erick
On Wed, Nov 30, 2011 at 9:23 AM, Marian Stein
I have documents containing tokens of a certain format in arbitrary
positions, like this:
... blah blahblah AB/1234/5678 blah blah blahblah ...
I would like to enable "usual" keyword searching within these documents. In
addition, I'd also like to enable users to find "AB/1234/5678", ideally
w