Jan, Looks interesting. I will try this. Thanks! Darren
On Mon, 2010-06-28 at 19:54 +0200, Jan Høydahl / Cominvent wrote: > Hi, > > You might also want to check out the new Lucene-Hunspell stemmer at > http://code.google.com/p/lucene-hunspell/ > It uses OpenOffice dictionaries with known stems in combination with a large > set of language specific rules. > It handles your example, but it is an early release, so test it thoroughly > before deploying in production :) > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Training in Europe - www.solrtraining.com > > On 28. juni 2010, at 17.43, Joe Calderon wrote: > > > the general consensus among people who run into the problem you have > > is to use a plurals only stemmer, a synonyms file or a combination of > > both (for irregular nouns etc) > > > > if you search the archives you can find info on a plurals stemmer > > > > On Mon, Jun 28, 2010 at 6:49 AM, <dar...@ontrenet.com> wrote: > >> Thanks for the tip. Yeah, I think the stemming confounds search results as > >> it stands (porter stemmer). > >> > >> I was also thinking of using my dictionary of 500,000 words with their > >> complete morphologies and conjugations and create a synonyms.txt to > >> provide english accurate morphology. > >> > >> Is this a good idea? > >> > >> Darren > >> > >>> Hi Darren, > >>> > >>> You might want to look at the KStemmer > >>> (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem) > >>> instead of the standard PorterStemmer. It essentially has a 'dictionary' > >>> of exception words where stemming stops if found, so in your case > >>> president won't be stemmed any further than president (but presidents will > >>> be stemmed to president). You will have to integrate it into solr > >>> yourself, but that's straightforward. > >>> > >>> HTH > >>> Brendan > >>> > >>> > >>> On Jun 28, 2010, at 8:04 AM, Darren Govoni wrote: > >>> > >>>> Hi, > >>>> It seems to me that because the stemming does not produce > >>>> grammatically correct stems in many of the cases, > >>>> search anomalies can occur like the one I am seeing where I have a > >>>> document with "president" in it and it is returned > >>>> when I search for "preside", a different word entirely. > >>>> > >>>> Is this correct or acceptable behavior? Previous discussions here on > >>>> stemming, I was told its ok as long as all the words reduce > >>>> to the same stem, but when different words reduce to the same stem it > >>>> seems to affect search results in a "bad way". > >>>> > >>>> Darren > >>> > >>> > >> > >> >