Hello:

I'm working on implementing a requirement where when a document is returned, we 
want to pithily tell the end user why. That is, say, with five documents 
returned, they may be so for similar or different reasons. These "reasons" are 
the field(s) in which matches occurred.  Some are more important than others, 
and I'll have to return just the most relevant one or two reasons to not 
overwhelm the user.

This is a separate goal than Solr's scoring of the returned documents. That is, 
index/query time boosting can indicate which fields are more significant in 
computing the overall document score, but then I need to know what fields 
where, matched with what terms. I do have an application that stands between 
Solr and the end user (RESTful API), so I figured I can rank the "reasons" and 
return more domain specific names rather than the Solr fields names.

So, I've turned to highlighting, and in the results I can see for each document 
ID the fields matched, and the text in the field etc. Great. But,  to get that 
to work, I have to specifically query individual fields. That is, the approach 
of <copyField>'ing a bunch of fields to a common text field for efficiency 
purposes is no longer an option. And, using the dismax request handler, I'm 
querying a lot of fields:

     <str name="qf">
        n_nameExact^4.0        
        n_macromolecule_nameExact^3.0
        n_macromolecule_name^2.0
        n_macromolecule_id^1.8
        n_pathway_nameExact^1.5
        n_top_regulates
        n_top_regulated_by
        n_top_binds
        n_top_role_in_cell
        n_top_disease
        n_molecular_function
        n_protein_family
        n_subcell_location
        n_pathway_name
        n_cell_component
        n_bio_process
        n_synonym^0.5
        n_macromolecule_summary^0.6
        p_nameExact^4.0 
        p_name^2.0
        p_description^0.6
     </str>

Is that crazy?  Is telling Solr to look at so many individual fields going to 
be a performance problem?  I'm only prototyping at this stage and it works 
great. :)  I've not run anything yet at scale handling lots of requests.

There are two document types in that shared index, demarcated using a field 
named type.  So, when configuring the SolrJ SolrQuery, I do setup 
addFilterQuery() to select one or the other type.

Anyway, using dismax with all of those query fields along with highlighting, I 
get the information I need to render meaningful results for the end user.  But, 
it has a sort of smell to it. :)   Shall I look for another way, or am I 
worrying about nothing?

I am current using Solr 3.1 trunk.

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com

Reply via email to