Indeed. CommonGrams.java in Nutch is the place to look. Otis
----- Original Message ---- From: Erik Hatcher <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, November 13, 2006 2:08:51 PM Subject: Re: Index & search questions; special cases On Nov 13, 2006, at 1:51 PM, Chris Hostetter wrote: > That reminds me ... i seem to remember someone saying once that > Nutch lso > builds word based n-grams out of it's stop words, so searches on "the" > or "on" won't match anything because those words are never indexed > as a > single tokens, but if a document contains "the dog in the house" it > would > match a search on "in the" becaue the Analyzer would treat that as a > single token "in_the". Yup.... we covered this in LIA: <http://lucenebook.com/search?query=nutch+stop+words>