Sorry wrong list
mike.schultz wrote: > > For various statistics I collect from an index it's important for me to > know the length (measured in tokens) of a document field. I can get that > information to some degree from the "norms" for the field but a) the > resolution isn't that great, and b) more importantly, if boosts are used > it's almost impossible to get lengths from this. > > Here's two ideas I was thinking about that maybe some can comment on. > > 1) Use copyto to copy the field in question, fieldA to an addition field, > fieldALength, which has an extra filter that just counts the tokens and > only outputs a token representing the length of the field. This has the > disadvantage of retokenizing basically the whole document (because the > field in question is basically the body). Plus I would think littering > the term space with these tokens might be bad for performance, I'm not > sure. > > 2) Add a filter to the field in question which again counts the tokens. > This filter allows the regular tokens to be indexed as usual but somehow > manages to get the token-count into a stored field of the document. This > has the advantage of not having to retokenize the field and instead of > littering the token space, the count becomes docdata for each doc. Can > this be done? Maybe using threadLocal to temporarily store the count? > > Thanks. > -- View this message in context: http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297597p25297661.html Sent from the Solr - Dev mailing list archive at Nabble.com.