MoreLikeThis and term vectors - documentation suggestion

Ken Krugler Mon, 26 Feb 2007 11:50:28 -0800

Hi all,

I was trying out the MoreLikeThis support, and getting some odd results.

I realized that unless the fields being used for similaritycalculation have a stored term vector, the MoreLikeThis code fromLucene will re-analyze the field using the StandardAnalyzer. Which,in my case, is quite different from what I'm using in the Solr schema.

So the first note is just for anybody using MoreLikeThis, make sureyou also specify termVectors=true in the Solr schema for any fieldsbeing passed to the query as mlt.fl parameters.

The second note is that the Wiki page and the example schema mightwant to include some reference to the termVectors field attribute.For example, the sample schema says:

   <!-- Valid attributes for fields:
     name: mandatory - the name for the field

type: mandatory - the name of a previously defined type fromthe <types> section

     indexed: true if this field should be indexed (searchable or sortable)
     stored: true if this field should be retrievable
     compressed: [false] if this field should be stored using gzip compression
       (this will only apply if the field type is compressable; among
       the standard field types, only TextField and StrField are)
     multiValued: true if this field may contain multiple values per document
     omitNorms: (expert) set to true to omit the norms associated with
       this field (this disables length normalization and index-time
       boosting for the field, and saves some memory).  Only full-text
       fields or fields that need an index-time boost need norms.

Which made me think initially these were the only valid attributesfor fields. Likewise the wiki page athttp://wiki.apache.org/solr/SchemaXml also doesn't make any mentionof termVectors, termPositions, or termOffsets. I would edit thatpage, but there currently isn't a section that talks about all theattributes, only the common ones.


Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

MoreLikeThis and term vectors - documentation suggestion

Reply via email to