Indeed.  CommonGrams.java in Nutch is the place to look.

Otis

----- Original Message ----
From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 13, 2006 2:08:51 PM
Subject: Re: Index & search questions; special cases


On Nov 13, 2006, at 1:51 PM, Chris Hostetter wrote:
> That reminds me ... i seem to remember someone saying once that  
> Nutch lso
> builds word based n-grams out of it's stop words, so searches on "the"
> or "on" won't match anything because those words are never indexed  
> as a
> single tokens, but if a document contains "the dog in the house" it  
> would
> match a search on "in the" becaue the Analyzer would treat that as a
> single token "in_the".


Yup.... we covered this in LIA:

    <http://lucenebook.com/search?query=nutch+stop+words>





Reply via email to