Hello again: I'm looking at the newer join functionality (http://wiki.apache.org/solr/Join) to see if that will help me out. While there are signs it can go cross index/core (https://issues.apache.org/jira/browse/SOLR-2272), I doubt I can specify facet.field params for fields in a couple of different indexes. But, perhaps a single combined index it might work.
Anyway, the above Jira item indicates status: resolved, resolution: fixed, and Fix version/s: 4.0. I've been working with 3.5.0, so I checked out 4.0 from svn today: [imac:svn/dev/trunk] jas% svn info Path: . URL: http://svn.apache.org/repos/asf/lucene/dev/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 1210126 ... Last Changed Rev: 1210116 Last Changed Date: 2011-12-04 07:35:46 -0700 (Sun, 04 Dec 2011) Issuing a join query looks like the local params syntax is being ignored and is part of the search terms? I get zero results, when w/o the join I get 979. <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">1</int> <lst name="params"> <str name="fl">id,n_type,n_name</str> <str name="q">{!join from=conceptId to=id fromIndex=partner-tmo}brca1</str> <str name="qt">partner-tmo</str> <str name="fq">type:node</str> <str name="rows">5</str> </lst> </lst> <result name="response" numFound="0" start="0"/> </response> I've not yet fully explored this yet, and I'm not all that familiar with the Solr codebase, but is this functionality in 4.x trunk or not? I can see there is the package org.apache.lucene.search.join. Is this the implementation of SOLR-2272? I can see the commit was made earlier this year, and then it was reverted and things went off the rails. I don't want to open any old wounds, but does the join exist? I not, I'll know not to pursue it any further. If so, is there some solrconfig.xml configuration needed to enable it? I don't see it in the examples. Thanks, Jeff On Dec 1, 2011, at 9:47 PM, Jeff Schmidt wrote: > Hello: > > I'm trying to relate together two different types of documents. Currently I > have 'node' documents that reside in one index (core), and 'product mapping' > documents that are in another index. The product mapping index is used to > map tenant products to nodes. The nodes are canonical content that gets > updated every quarter, where as the product mappings can change at any time. > > I put them in two indexes because (1) canonical content changes rarely, and I > don't want product mapping changes to affect it (commit, re-open searchers > etc.), and I would like to support multiple tenants mapping products to the > same canonical content to avoid duplication (a few GB). > > This arrange has worked well thus far, but only in the sense that for each > node result returned, I can query the product mapping index to determine the > products mapped to the node. I combine this information within my > application and return it to the client. This works okay in that there are > only 5-20 results returned per page (start, rows). But now I'm being asked > to facet the product catagories (multi-valued field within a product mapping > document) along with other facets defined in the canonical content. > > Can this be done with Solr 3.5.0? I've been looking into sub-queries, > function queries etc. Also, I've seen various postings indicating that one > needs to denormalize more. I don't want to add product information as fields > to the canonical content. Not only does that defeat my objective (1) above, > but Solr does not support incremental updates of document fields. > > So, one approach is to issue by query to the canonical index and get all of > the document IDs (could be 1000s), and then issue a filter query to the > product mapping index with all of these IDs and have Solr facet the product > categories. Is that efficient? I suppose I could use HTTP POST (via SolrJ) > to convey that payload of IDs? I could then take the facet results of that > query and combine them with the canonical index results and return them to > the client. > > That may be do-able, but then let's say the user clicks on a product category > facet value to narrow the node results to only those mapped to category XYZ. > This will not affect the query issued against the canonical content index. > Instead, I think I'd have to go through the canonical results and eliminate > the nodes that are not associated with product category XYZ. Then, if the > current page of results is inadequate (rows=10, but 3 nodes were eliminated), > I'd have to go back to the canonical index to get more rows, eliminate some > some again perhaps, get more etc. That sounds unappealing and low performing. > > Is there a Solr way to do this? My Packt "Apache Solr 3 Enterprise Search > Server" book (page 34) states regarding separate indices: > > "If you do develop separate schemas and if you need to search across > your indices in one search then you must perform a distributed search, > described in the last chapter. A distributed search is usually a feature > employed for a large corpus but it applies here too." > > But in the chapter it goes on to talk about dealing with sharding, > replication etc. to support a large corpus, not necessarily tying together > two different indexes. > > Is it possible to accomplish my goal in a less ugly way than I outlined > above? Since we only have a single tenant to worry about, I could use a > combined index at least for a few months (separate fields per document type, > IDs are unique among then all) if that makes a difference. > > Thanks! > > Jeff > -- > Jeff Schmidt > 535 Consulting > j...@535consulting.com > http://www.535consulting.com > (650) 423-1068 > > > > > > > > > -- Jeff Schmidt 535 Consulting j...@535consulting.com http://www.535consulting.com (650) 423-1068