Hello:

Solr version: 3.4.0

I'm trying to figure out if it's possible to both return (retrieval) as well as 
facet on certain values of a multivalued field.  The scenario is a life science 
app comprised of a graph of nodes (genes, chemicals etc.) and each node has a 
"neighborhood" consisting of one or more nodes with which it has a 
relationships defined as "processes" ("inhibition", "phosphorylation" etc.).

What I've done is add a number of multi-valued fields to each node consisting 
of the neighbor node ID (neighbor's document ID), process, and couple of other 
related items.  For a given node, it'll have multiple neighbors, as well as 
multiple processes with a single neighbor.  For example, in schema.xml:

      <field name="id" type="string" indexed="true" stored="true" 
required="true" /> 

      <!-- Network neighborhood fields -->
      <field name="n_neighborof_id" type="string" indexed="true" stored="true" 
multiValued="true" />
      <field name="n_neighborof_name" type="text_lc_np" indexed="true" 
stored="true" multiValued="true" termVectors="true" />
      <field name="n_neighborof_process" type="text_lc_np" indexed="true" 
stored="true" multiValued="true" termVectors="true" />
      <field name="n_neighborof_processExact" type="string" indexed="true" 
stored="true" multiValued="true" termVectors="true" />
      <field name="n_neighborof_edge_type" type="string" indexed="true" 
stored="true" multiValued="true" />
      <field name="n_neighborof_is_direct" type="boolean" indexed="true" 
stored="true" multiValued="true" />
      <field name="n_neighborof_count" type="sint" indexed="false" 
stored="true" multiValued="true" />

Note that the type text_lc_np simply lowercases and ignores punctuation.

So, when I want the neighbors of a given node, I define a filter query like 
fq=n_neighborof_id=someFocusNodeId and I get all of the the neighbors. That's 
exactly what I want in terms of documents. There are a number of per document 
fields that are returned with the search results.  This includes the actual 
process information defined above. Not surprisingly, I get all all of the 
values for each field. But I do not want them, I only want those that pertain 
to the specified focus node ID.

For now, my workaround for the retrieval aspect of this is for my application 
to chuck the irrelevant values.  That is, for a set or related field values, if 
n_neighborof_id != focusNodeId, then out they go. While this gets the job done, 
it is quite wasteful in terms of both processing by both Solr and my app, as 
well as bandwidth.

Now I need to facet on a couple of the neighbor fields. Solr returns counts 
relevant to all processes defined within the document result set. Again, that 
is expected, but not what I want.  I'd like Solr to compute facet counts only 
for processes relevant to the specified focus node, much like my filter query 
to get the document results.

Is this possible?  I've looked at grouping queries, though those are document 
centric and do not work for multivalued fields. I've looked into implementing 
my own SearchComponent within the Solr server.  It sounded ideal to drop 
something I have control over right between the standard query and facet 
components. I figured I could eliminate the undesired fields at that point, 
both solving my first problem of having to toss irrelevant processes in my app, 
and having Solr compute facet values using only the desired processes.  But, 
there are comments in the Solr source code that stipulates a component must not 
modify the document set.  For example, in org.apache.solr.search.DocSet:

/**
 * <code>DocSet</code> represents an unordered set of Lucene Document Ids.
 *
 * <p>
 * WARNING: Any DocSet returned from SolrIndexSearcher should <b>not</b> be 
modified as it may have been retrieved from
 * a cache and could be shared.
 * </p>
 *
 * @version $Id: DocSet.java 1065312 2011-01-30 16:08:25Z rmuir $
 * @since solr 0.9
 */

Perhaps I cannot use this avenue to accomplish my goals?  But, I don't need to 
modify the document set itself (IDs etc.), just trim the field values per 
document. Does that make sense?

I may well have to re-evaluate my data model, but I'd like to get what I need 
with what I have currently defined if possible.

Thanks,

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068









Reply via email to