Hello: Solr version: 3.4.0
I'm trying to figure out if it's possible to both return (retrieval) as well as facet on certain values of a multivalued field. The scenario is a life science app comprised of a graph of nodes (genes, chemicals etc.) and each node has a "neighborhood" consisting of one or more nodes with which it has a relationships defined as "processes" ("inhibition", "phosphorylation" etc.). What I've done is add a number of multi-valued fields to each node consisting of the neighbor node ID (neighbor's document ID), process, and couple of other related items. For a given node, it'll have multiple neighbors, as well as multiple processes with a single neighbor. For example, in schema.xml: <field name="id" type="string" indexed="true" stored="true" required="true" /> <!-- Network neighborhood fields --> <field name="n_neighborof_id" type="string" indexed="true" stored="true" multiValued="true" /> <field name="n_neighborof_name" type="text_lc_np" indexed="true" stored="true" multiValued="true" termVectors="true" /> <field name="n_neighborof_process" type="text_lc_np" indexed="true" stored="true" multiValued="true" termVectors="true" /> <field name="n_neighborof_processExact" type="string" indexed="true" stored="true" multiValued="true" termVectors="true" /> <field name="n_neighborof_edge_type" type="string" indexed="true" stored="true" multiValued="true" /> <field name="n_neighborof_is_direct" type="boolean" indexed="true" stored="true" multiValued="true" /> <field name="n_neighborof_count" type="sint" indexed="false" stored="true" multiValued="true" /> Note that the type text_lc_np simply lowercases and ignores punctuation. So, when I want the neighbors of a given node, I define a filter query like fq=n_neighborof_id=someFocusNodeId and I get all of the the neighbors. That's exactly what I want in terms of documents. There are a number of per document fields that are returned with the search results. This includes the actual process information defined above. Not surprisingly, I get all all of the values for each field. But I do not want them, I only want those that pertain to the specified focus node ID. For now, my workaround for the retrieval aspect of this is for my application to chuck the irrelevant values. That is, for a set or related field values, if n_neighborof_id != focusNodeId, then out they go. While this gets the job done, it is quite wasteful in terms of both processing by both Solr and my app, as well as bandwidth. Now I need to facet on a couple of the neighbor fields. Solr returns counts relevant to all processes defined within the document result set. Again, that is expected, but not what I want. I'd like Solr to compute facet counts only for processes relevant to the specified focus node, much like my filter query to get the document results. Is this possible? I've looked at grouping queries, though those are document centric and do not work for multivalued fields. I've looked into implementing my own SearchComponent within the Solr server. It sounded ideal to drop something I have control over right between the standard query and facet components. I figured I could eliminate the undesired fields at that point, both solving my first problem of having to toss irrelevant processes in my app, and having Solr compute facet values using only the desired processes. But, there are comments in the Solr source code that stipulates a component must not modify the document set. For example, in org.apache.solr.search.DocSet: /** * <code>DocSet</code> represents an unordered set of Lucene Document Ids. * * <p> * WARNING: Any DocSet returned from SolrIndexSearcher should <b>not</b> be modified as it may have been retrieved from * a cache and could be shared. * </p> * * @version $Id: DocSet.java 1065312 2011-01-30 16:08:25Z rmuir $ * @since solr 0.9 */ Perhaps I cannot use this avenue to accomplish my goals? But, I don't need to modify the document set itself (IDs etc.), just trim the field values per document. Does that make sense? I may well have to re-evaluate my data model, but I'd like to get what I need with what I have currently defined if possible. Thanks, Jeff -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068