: I think a very simple and generic way to handle this would be to have : the FieldType hold a LengthNorm class in the same way it currently holds : the Tokenizer / Analyzers. We can then provide e.g. a DefaultLengthNorm : (same as DefaultSimilarity) and NoLengthNorm (lengthnorm = 1), and users : can create their own subclasses if they want to.
no length norm is already accounted for with omitNorms (which is more efficientthen justreturning a constant from the lengthNorm function) ... it's the part about making users create their own subclasses that gets tricky -- if they have to write their own classes to implement their own function,s there isn't a lot of value add in doing this per fieldtype instead of letting them put it in a custom Similarity class. This would assume we use a specific SolrSimilarity that is aware of the IndexSchema and knows to check the FieldTypes to find the LengthNorm providers for each field(type) ... we don'thave anything to do that right now (there would also be some oddity in making it clear that the FieldType's LengthNorm configs would only apply if they use a new SolrSimilarity ... but this is an easily checked and warning loggable situation) I'm not saying it can't be done: i'm just saying that getting the API/configuration issues "right" is hard to do, most solutions wind up seeming pretty hackish ... so not a lot of effort has been put into it. : My specific use case is a product search engine for which I don't want a : length norm at all on most fields, and where I do want it I want longer : fields to only get a minimally smaller boost, e.g. 0.8 for a "long" : value (whatever long is exactly) compared to 1.0 for a "short" value. Like i said: omitNorms will take care of the first part of your problem, the second part seems like you don't really need per field lengthNorms at all, just a custom Similarity class with a simple step function for lengthNorm. : The main problem I'm running into with this is that I can't override : encodeNorm/decodeNorm because they're static, and the current encoding : (while providing a huge range of values) doesnt have much precision : around any individual value. that's a broder lucene issue ... norms are not very granular so that they can be stored effeciently, and the encoding/decoding are static so that there is no risk of being missinterpreted if another app opens the index. if you only care about having two distinctvlaues for long vs short, just pick a common multiplier that gives you new values that are more distinct when encoded ... since they'll apply globally the end result should be roughly the same, although you might need to tweak some of your query boosts (or change the queryNorm funciton in your similarity) to get the exact same ordering for all queries. -Hoss
