Sorry wrong list

mike.schultz wrote:
> 
> For various statistics I collect from an index it's important for me to
> know the length (measured in tokens) of a document field.  I can get that
> information to some degree from the "norms" for the field but a) the
> resolution isn't that great, and b) more importantly, if boosts are used
> it's almost impossible to get lengths from this.
> 
> Here's two ideas I was thinking about that maybe some can comment on.
> 
> 1) Use copyto to copy the field in question, fieldA to an addition field,
> fieldALength, which has an extra filter that just counts the tokens and
> only outputs a token representing the length of the field.  This has the
> disadvantage of retokenizing basically the whole document (because the
> field in question is basically the body).  Plus I would think littering
> the term space with these tokens might be bad for performance, I'm not
> sure.
> 
> 2) Add a filter to the field in question which again counts the tokens. 
> This filter allows the regular tokens to be indexed as usual but somehow
> manages to get the token-count into a stored field of the document.  This
> has the advantage of not having to retokenize the field and instead of
> littering the token space, the count becomes docdata for each doc.  Can
> this be done?  Maybe using threadLocal to temporarily store the count?
> 
> Thanks.
> 

-- 
View this message in context: 
http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297597p25297661.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Reply via email to