subject:"Re\: How to handle words that stem to stop words"

Re: How to handle words that stem to stop words

2014-07-10 Thread Arjen van der Meijden

I'm reluctant to apply either solution: Emitting both tokens will likely still provide the user with a very long result list. Even though the results with 'vans' in it are likely to be ranked to the top, its still not very user friendly due to its overwhelmingly large number of results (nor

Re: How to handle words that stem to stop words

2014-07-10 Thread Sujit Pal

Hi Arjen, This is kind of a spin on your last observation that your list of stop words don't change frequently. If you have a custom filter that attempts to stem the incoming token and if it stems to the same as a stopword, only then sets the keyword attribute on the original token. That way

Re: How to handle words that stem to stop words

2014-07-10 Thread Arjen van der Meijden

Hi Sujit, Thanks. I was thinking along those lines myself. And reversely, the same list of stopwords could be used to mark the stopwords as keyword as well, to prevent them from collapsing with rare words. Best regards, Arjen On 10-7-2014 22:30 Sujit Pal wrote: Hi Arjen, This is kind of

Re: How to handle words that stem to stop words

2014-07-07 Thread Tri Cao

I think emitting two tokens for vans is the right (potentially only) way to do it. You could also control the dictionary of terms that require this special treatment. Any reason makes you not happy with this approach? On Jul 06, 2014, at 11:48 AM, Arjen van der Meijden acmmail...@tweakers.net

Re: How to handle words that stem to stop words

2014-07-07 Thread Jack Krupansky

Some of these anomalous cases are best handled by simply suppressing stemming, using PatternKeywordMarkerFilter and SetKeywordMarkerFilter, to set the keyword attribute for matching tokens and then most stemmers will not change them. You can create a list of words to ignore, like plurals of

Re: How to handle words that stem to stop words

2014-07-07 Thread Sujit Pal

Hi Arjen, You could also mark a token as keyword so the stemmer passes it through unchanged. For example, per the Javadocs for PorterStemFilter: http://lucene.apache.org/core/4_6_0/analyzers-common/org/apache/lucene/analysis/en/PorterStemFilter.html Note: This filter is aware of the

Re: How to handle words that stem to stop words

2014-07-07 Thread David Murgatroyd

Arjen, An approach requiring less list maintenance could be more advanced linguistic processing to distinguish the stop word from the content word, such as lemmatization rather than stemming. A commercial offering, Rosette Search Essentials from Basis http://www.basistech.com/search-essentials/

Re: How to handle words that stem to stop words

Re: How to handle words that stem to stop words

Re: How to handle words that stem to stop words

Re: How to handle words that stem to stop words

Re: How to handle words that stem to stop words

Re: How to handle words that stem to stop words

Re: How to handle words that stem to stop words

7 matches

Site Navigation

Mail list logo

Footer information