Hi, Thanks a lot for the info and your time. I think field collapse will work for us. I looked at the https://issues.apache.org/jira/browse/SOLR-236 but which file I should use for patch. We use solr-1.3.
Thanks Bharat Jain On Fri, Jul 30, 2010 at 12:53 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote: > > : 1. There are user records of type A, B, C etc. (userId field in index is > : common to all records) > : 2. A user can have any number of A, B, C etc (e.g. think of A being a > : language then user can know many languages like french, english, german > etc) > : 3. Records are currently stored as a document in index. > : 4. A given query can match multiple records for the user > : 5. If for a user more records are matched (e.g. if he knows both french > and > : german) then he is more relevant and should come top in UI. This is the > : reason I wanted to add lucene scores assuming the greater score means > more > : relevance. > > if your goal is to get back "users" from each search, then you should > probably change your indexing strategry so that each "user" has a single > document -- fields like "langauge" can be multivalued, etc... > > then a search for "language:en langauge:fr" will return users who speak > english or french, and hte ones that speak both will score higher. > > if you really cant change the index structure, then essentially waht you > are looking for is a "field collapsing" solution on the userId field, > where you want each collapsed group to get a cumulative score. i don't > know if the existing field collapsing patches support this -- if you are > already willing/capable to do it in the lcient then that may be the > simplest thing to support moving foward. > > Adding the scores is certainly one metric you could use -- it's generally > suspicious to try and imply too much meaning to scores in lucene/solr but > that's becuase people typically try to imply broader absolute meaning. in > the case of a single query the scores are relative eachother, and adding > up all the scores for a given userId is approximaly what would happen in > my example above -- except that there is also a "coord" factor that would > penalalize documents that only match one clause ... it's complicated, but > as an approximation adding the scores might give you what you are looking > for -- only you can know for sure based on your specific data. > > > > -Hoss > >