Hi Jonathan:

On Mar 7, 2011, at 8:33 AM, Jonathan Rochkind wrote:

> I use about that many qf's in Solr 1.4.1.   It works. I'm not entirely sure 
> if it has performance implications -- I do have searching that is somewhat 
> slower then I'd like, but I'm not sure if the lengthy qf is a contributing 
> factor, or other things I'm doing (like a dozen different facet.fields too!). 
>   I haven't profiled everything.  But it doesn't grind my Solr to a halt or 
> anything, it works.

Thanks for the feedback on that. I'll learn more on how this performs in the 
coming months, but if the approach is doomed from the start, that would be good 
to know sooner rather than later, so I could consider doing something else (not 
sure what that would be). It is a pretty big customer requirement though, so 
perhaps it can be carried out regardless by using more EC2 instances? :)

> Seperately, I've also been thinking of other ways to get similar highlighting 
> behavior as you describe, give the 'field' that the match was in in the 
> highlight response, but haven't come up with anything great, if your approach 
> works, that's cool.  I've been trying to think of a way to store a single 
> stored field in a structured format (CSV? XML?), and somehow have the 
> highlighter return the complete 'field' that matches, not just the 
> surrounding X words. But haven't gotten anywhere on that, just an idle 
> thought.

That's an interesting idea. There are a number of other highlighting related 
parameters I've not yet played with yet, relating to fragment size, snippets, 
max analyzed chars etc.  Could those get your what you need w/o having to 
create a separate structured field?

In my case, most of the fields I'm searching are small in size, and I  just 
need to know in what field(s) a match occurred. Often, the actual matched 
characters are less important than the fact that the provided terms matched in 
that field.  

Take it easy,


> Jonathan
> On 3/4/2011 10:09 AM, Jeff Schmidt wrote:
>> Hello:
>> I'm working on implementing a requirement where when a document is returned, 
>> we want to pithily tell the end user why. That is, say, with five documents 
>> returned, they may be so for similar or different reasons. These "reasons" 
>> are the field(s) in which matches occurred.  Some are more important than 
>> others, and I'll have to return just the most relevant one or two reasons to 
>> not overwhelm the user.
>> This is a separate goal than Solr's scoring of the returned documents. That 
>> is, index/query time boosting can indicate which fields are more significant 
>> in computing the overall document score, but then I need to know what fields 
>> where, matched with what terms. I do have an application that stands between 
>> Solr and the end user (RESTful API), so I figured I can rank the "reasons" 
>> and return more domain specific names rather than the Solr fields names.
>> So, I've turned to highlighting, and in the results I can see for each 
>> document ID the fields matched, and the text in the field etc. Great. But,  
>> to get that to work, I have to specifically query individual fields. That 
>> is, the approach of<copyField>'ing a bunch of fields to a common text field 
>> for efficiency purposes is no longer an option. And, using the dismax 
>> request handler, I'm querying a lot of fields:
>>      <str name="qf">
>>         n_nameExact^4.0
>>         n_macromolecule_nameExact^3.0
>>         n_macromolecule_name^2.0
>>         n_macromolecule_id^1.8
>>         n_pathway_nameExact^1.5
>>         n_top_regulates
>>         n_top_regulated_by
>>         n_top_binds
>>         n_top_role_in_cell
>>         n_top_disease
>>         n_molecular_function
>>         n_protein_family
>>         n_subcell_location
>>         n_pathway_name
>>         n_cell_component
>>         n_bio_process
>>         n_synonym^0.5
>>         n_macromolecule_summary^0.6
>>         p_nameExact^4.0
>>         p_name^2.0
>>         p_description^0.6
>>      </str>
>> Is that crazy?  Is telling Solr to look at so many individual fields going 
>> to be a performance problem?  I'm only prototyping at this stage and it 
>> works great. :)  I've not run anything yet at scale handling lots of 
>> requests.
>> There are two document types in that shared index, demarcated using a field 
>> named type.  So, when configuring the SolrJ SolrQuery, I do setup 
>> addFilterQuery() to select one or the other type.
>> Anyway, using dismax with all of those query fields along with highlighting, 
>> I get the information I need to render meaningful results for the end user.  
>> But, it has a sort of smell to it. :)   Shall I look for another way, or am 
>> I worrying about nothing?
>> I am current using Solr 3.1 trunk.
>> Thanks!
>> Jeff
>> --
>> Jeff Schmidt
>> 535 Consulting
>> j...@535consulting.com
>> http://www.535consulting.com

Jeff Schmidt
535 Consulting

Reply via email to