Re: KeywordTokenizerFactory and stopwords

Matt Mitchell Wed, 08 Jun 2011 09:29:12 -0700

Hi Erik. Yes something like what you describe would do the trick. I
did find this:


http://lucene.472066.n3.nabble.com/Concatenate-multiple-tokens-into-one-td1879611.html

I might try the pattern replace filter with stopwords, even though
that feels kinda clunky.

Matt

On Wed, Jun 8, 2011 at 11:04 AM, Erik Hatcher <erik.hatc...@gmail.com> wrote:
> This seems like it deserves some kind of "collecting" TokenFilter(Factory) 
> that will slurp up all incoming tokens and glue them together with a space 
> (and allow separator to be configurable).   Hmmm.... surprised one of those 
> doesn't already exist.  With something like that you could have a standard 
> tokenization chain, and put it all back together at the end.
>
>        Erik
>
> On Jun 8, 2011, at 10:59 , Matt Mitchell wrote:
>
>> Hi,
>>
>> I have an "autocomplete" fieldType that works really well, but because
>> the KeywordTokenizerFactory (if I understand correctly) is emitting a
>> single token, the stopword filter will not detect any stopwords.
>> Anyone know of a way to strip out stopwords when using
>> KeywordTokenizerFactory? I did try the reg-exp replace filter, but I'm
>> not sure I want to add a bunch of reg-exps for replacing every
>> stopword.
>>
>> Thanks,
>> Matt
>>
>> Here's the fieldType definition:
>>
>> <fieldType name="autocomplete" class="solr.TextField"
>> positionIncrementGap="100">
>>  <analyzer type="index">
>>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>>    <filter class="solr.TrimFilterFactory"/>
>>    <filter class="solr.LowerCaseFilterFactory"/>
>>    <filter class="solr.ASCIIFoldingFilterFactory"/>
>>
>>    <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
>> maxGramSize="50"/>
>>  </analyzer>
>>  <analyzer type="query">
>>    <tokenizer class="solr.KeywordTokenizerFactory"/>
>>    <filter class="solr.TrimFilterFactory"/>
>>    <filter class="solr.LowerCaseFilterFactory"/>
>>    <filter class="solr.ASCIIFoldingFilterFactory"/>
>>  </analyzer>
>> </fieldType>
>
>

Re: KeywordTokenizerFactory and stopwords

Reply via email to