Hello Roopesh, What you are seeing is called 'Stemming'. Stemming takes tokens and reduces them to their language specific prefixes. So for instance, when you search for attach, you get the word 'attachment', which shares a common English language specific prefix.
Newsletter is an interesting example: you will never get a match when you search for 'letter', because stemming only handles prefixes. The fact that you don't get a match for news is a bit more complicated. The stemming engine did not reduce newsletter all the way to the 'news' prefix, perhaps because the words have semantically different meanings (where in the attach/attachment case, an attachment is something that you attach). I can't find any good Solr specific stemming links, but check out the Wikipedia page: http://en.wikipedia.org/wiki/Stemming Thanks, Stu -----Original Message----- From: Roopesh P Raj <[EMAIL PROTECTED]> Sent: Wednesday, February 13, 2008 1:43am To: solr-dev@lucene.apache.org Subject: Doub't in the way lucene works Hi, I am using solr in my project. I have used the schema almost similar to the one given in the example folder which comes along when we download solr. Most of the fields that I use is of type "text", and the rest are of type "string". Some of the search results are as follows: When I search with a query, "attach", documents containing "attach", "attachment", "attachments" comes as the result. When the search string is "attachment", then also documents containing "attach", "attachment", "attachments" comes as the result. When I search for "newsletter", documents with keyword "newsletter" results. But when I search for "news", no results appear. When I search for "letter", then also there are no results. Why does this happen? Why is lucene not giving documents with "newsletter" when the search string given is "letter" or "news"? I am pasting the "text" fieldtype declaration also. Please help me. <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Regards Roopesh ------------------ DigitalGlue, India