: I think a very simple and generic way to handle this would be to have
: the FieldType hold a LengthNorm class in the same way it currently holds
: the Tokenizer / Analyzers.  We can then provide e.g. a DefaultLengthNorm
: (same as DefaultSimilarity) and NoLengthNorm (lengthnorm = 1), and users
: can create their own subclasses if they want to.

no length norm is already accounted for with omitNorms (which is more 
efficientthen justreturning a constant from the lengthNorm function) 
... it's the part about making users create their own subclasses that gets 
tricky -- if they have to write their own classes to implement their own 
function,s there isn't a lot of value add in doing this per fieldtype 
instead of letting them put it in a custom Similarity class.

This would assume we use a specific SolrSimilarity that is aware of the 
IndexSchema and knows to check the FieldTypes to find the LengthNorm 
providers for each field(type) ... we don'thave anything to do that right 
now (there would also be some oddity in making it clear that the 
FieldType's LengthNorm configs would only apply if they use 
a new SolrSimilarity ... but this is an easily checked and warning 
loggable situation)

I'm not saying it can't be done: i'm just saying that getting the 
API/configuration issues "right" is hard to do, most solutions wind up 
seeming pretty hackish ... so not a lot of effort has been put into it.

: My specific use case is a product search engine for which I don't want a
: length norm at all on most fields, and where I do want it I want longer
: fields to only get a minimally smaller boost, e.g. 0.8 for a "long"
: value (whatever long is exactly) compared to 1.0 for a "short" value.

Like i said: omitNorms will take care of the first part of your problem, 
the second part seems like you don't really need per field lengthNorms at 
all, just a custom Similarity class with a simple step function for 
lengthNorm.

: The main problem I'm running into with this is that I can't override
: encodeNorm/decodeNorm because they're static, and the current encoding
: (while providing a huge range of values) doesnt have much precision
: around any individual value.

that's a broder lucene issue ... norms are not very granular so that they 
can be stored effeciently, and the encoding/decoding are static so that 
there is no risk of being missinterpreted if another app opens the index.

if you only care about having two distinctvlaues for long vs short, just 
pick a common multiplier that gives you new values that are more 
distinct when encoded ... since they'll apply globally the end result 
should be roughly the same, although you might need to tweak some of your 
query boosts (or change the queryNorm funciton in your similarity) to get 
the exact same ordering for all queries.


-Hoss

Reply via email to