Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
Got me right when Solr reported the error on restart :) Thanks! 2011/11/30 Steven A Rowe > Note that my example does not actually use PatternReplaceCharFilterFactory > twice - the second one is actually a PatternReplaceFilterFactory - note > that "Char" isn't present in the second one. > > CharF

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
enizers. Steve > -Original Message- > From: Marian Steinbach [mailto:marian.steinb...@gmail.com] > Sent: Wednesday, November 30, 2011 10:44 AM > To: solr-user@lucene.apache.org > Subject: Re: Leaving certain tokens intact during indexing and search > > That's pretty h

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
That's pretty helpful, thanks! Especially since I didn't understand so far that I could use a filter like PatternReplaceCharFilterFactory both as a charFilter and as a filter. In the meantime I had figured out another alternative, involving WordDelimiterFilterFactory. But I had to use WhitespaceTo

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
arian.steinb...@gmail.com] > Sent: Wednesday, November 30, 2011 9:41 AM > To: solr-user@lucene.apache.org > Subject: Re: Leaving certain tokens intact during indexing and search > > Thanks for the quick response! > > Are you saying that I should extend WhitespaceTokenizerFacto

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Erick Erickson
Well, it depends (tm). No, in your case WhitespaceTokenizer wouldn't work, although it did satisfy your initial statement. You could consider PatternTokenizerFactory, but take a look at the link I provided, and follow it to the javadocs to see if there are better matches. Best Erick On Wed, Nov

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
Thanks for the quick response! Are you saying that I should extend WhitespaceTokenizerFactory to create my own? Or should I simply use it? Because, I guess tokenizing on spaces wouldn't be enough. I would need tokenizing on slashes in other positions, just not within strings matching ([A-Z]+/[0-9

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Erick Erickson
There's about a zillion tokenizers, for what you're describing WhitespaceTokenizerFactory is a good candidate. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a partial list, and it has links to the authoritative docs. Best Erick On Wed, Nov 30, 2011 at 9:23 AM, Marian Stein

Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
I have documents containing tokens of a certain format in arbitrary positions, like this: ... blah blahblah AB/1234/5678 blah blah blahblah ... I would like to enable "usual" keyword searching within these documents. In addition, I'd also like to enable users to find "AB/1234/5678", ideally w