I'm a developing a recommendation feature in our app using the MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>, and so far it is doing a great job. We're using a user's "competency keywords" as the MLT field list and the user's corresponding document in Solr as the "comparison document". I have found that for one user I'm not receiving any recommendations, and I'm not sure why.
Solr: 4.1.0 *relevant schema*: <field name="competencyKeywords" type="short-mlt-text" indexed="true" stored="true" multiValued="true" termVectors="true"/> <fieldType name="short-mlt-text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> *user's values*: <arr name="competencyKeywords"> <str>Healthcare Cost Trends</str> </arr> Is it possible that among all the ~40,000 users in this index (about 500 of which have the same competency keywords), that the words "healthcare", "cost" and "trends" are just judged by Lucene to not be "significant". I realize that I may not understand how the MLT Handler is doing things under the covers...I've only been guessing until now based on the (otherwise excellent) results I've been seeing. Thanks, Andy Pickler P.S. For some additional information, the following query: /mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false ...produces the following results... <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">2</int> </lst> <result name="response" numFound="0" start="0"/> <lst name="interestingTerms"/> <lst name="debug"> <str name="rawquerystring">objectId:user91813</str> <str name="querystring">objectId:user91813</str> <str name="parsedquery"/> <str name="parsedquery_toString"/> <lst name="explain"/> </lst> </response>