Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term "represion", I found just only the word root, I mean it returns every document with the term "repres". Using the admin-debug search I found this:
<lst name="debug"> <str name="rawquerystring">description:represion</str> <str name="querystring">description:represion</str> <str name="parsedquery">description:repres</str> <str name="parsedquery_toString">description:repres</str> the "ion" part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. At schema, description is <field name="description" type="text" indexed="true" multiValued="true" stored="true"/> and text is: <fieldtype name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldtype> The only thing that is suspicious to me is the EnglishPorter. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Should I delete also from the index section? What I will loose deleting English porter? Thanks a lot for the help German