Re: Possible to facet across two indices, or document types in single index?

Jeff Schmidt Sun, 04 Dec 2011 15:13:29 -0800

Hello again:

I'm looking at the newer join functionality (http://wiki.apache.org/solr/Join) 
to see if that will help me out.  While there are signs it can go cross 
index/core (https://issues.apache.org/jira/browse/SOLR-2272), I doubt I can 
specify facet.field params for fields in a couple of different indexes.  But, 
perhaps a single combined index it might work.


Anyway, the above Jira item indicates status: resolved, resolution: fixed, and 
Fix version/s: 4.0.  I've been working with 3.5.0, so I checked out 4.0 from 
svn today:

[imac:svn/dev/trunk] jas% svn info
Path: .
URL: http://svn.apache.org/repos/asf/lucene/dev/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1210126
...
Last Changed Rev: 1210116
Last Changed Date: 2011-12-04 07:35:46 -0700 (Sun, 04 Dec 2011)

Issuing a join query looks like the local params syntax is being ignored and is 
part of the search terms?  I get zero results, when w/o the join I get 979.

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="fl">id,n_type,n_name</str>
            <str name="q">{!join from=conceptId to=id 
fromIndex=partner-tmo}brca1</str>
            <str name="qt">partner-tmo</str>
            <str name="fq">type:node</str>
            <str name="rows">5</str>
        </lst>
    </lst>
    <result name="response" numFound="0" start="0"/>
</response>

I've not yet fully explored this yet, and I'm not all that familiar with the 
Solr codebase, but is this functionality in 4.x trunk or not? I can see there 
is the package org.apache.lucene.search.join. Is this the implementation of 
SOLR-2272?

I can see the commit was made earlier this year, and then it was reverted and 
things went off the rails. I don't want to open any old wounds, but does the 
join exist?  I not, I'll know not to pursue it any further. If so, is there 
some solrconfig.xml configuration needed to enable it?  I don't see it in the 
examples.

Thanks,

Jeff

On Dec 1, 2011, at 9:47 PM, Jeff Schmidt wrote:

> Hello:
> 
> I'm trying to relate together two different types of documents.  Currently I 
> have 'node' documents that reside in one index (core), and 'product mapping' 
> documents that are in another index.  The product mapping index is used to 
> map tenant products to nodes. The nodes are canonical content that gets 
> updated every quarter, where as the product mappings can change at any time.
> 
> I put them in two indexes because (1) canonical content changes rarely, and I 
> don't want product mapping changes to affect it (commit, re-open searchers 
> etc.), and I would like to support multiple tenants mapping products to the 
> same canonical content to avoid duplication (a few GB).
> 
> This arrange has worked well thus far, but only in the sense that for each 
> node result returned, I can query the product mapping index to determine the 
> products mapped to the node.  I combine this information within my 
> application and return it to the client.  This works okay in that there are 
> only 5-20 results returned per page (start, rows).  But now I'm being asked 
> to facet the product catagories (multi-valued field within a product mapping 
> document) along with other facets defined in the canonical content.
> 
> Can this be done with Solr 3.5.0?  I've been looking into sub-queries, 
> function queries etc.  Also, I've seen various postings indicating that one 
> needs to denormalize more.  I don't want to add product information as fields 
> to the canonical content. Not only does that defeat my objective (1) above, 
> but Solr does not support incremental updates of document fields.
> 
> So, one approach is to issue by query to the canonical index and get all of 
> the document IDs (could be 1000s), and then issue a filter query to the 
> product mapping index with all of these IDs and have Solr facet the product 
> categories.  Is that efficient?  I suppose I could use HTTP POST (via SolrJ) 
> to convey that payload of IDs?  I could then take the facet results of that 
> query and combine them with the canonical index results and return them to 
> the client.
> 
> That may be do-able, but then let's say the user clicks on a product category 
> facet value to narrow the node results to only those mapped to category XYZ. 
> This will not affect the query issued against the canonical content index.  
> Instead, I think I'd have to go through the canonical results and eliminate 
> the nodes that are not associated with product category XYZ.  Then, if the 
> current page of results is inadequate (rows=10, but 3 nodes were eliminated), 
> I'd have to go back to the canonical index to get more rows, eliminate some 
> some again perhaps, get more etc.  That sounds unappealing and low performing.
> 
> Is there a Solr way to do this?  My Packt "Apache Solr 3 Enterprise Search 
> Server" book (page 34) states regarding separate indices:
> 
>       "If you do develop separate schemas and if you need to search across 
> your indices in one search then you must perform a distributed search, 
> described in the last chapter. A distributed search is usually a feature 
> employed for a large corpus but it applies here too."
> 
> But in the chapter it goes on to talk about dealing with sharding, 
> replication etc. to support a large corpus, not necessarily tying together 
> two different indexes.
> 
> Is it possible to accomplish my goal in a less ugly way than I outlined 
> above?  Since we only have a single tenant to worry about, I could use a 
> combined index at least for a few months (separate fields per document type, 
> IDs are unique among then all) if that makes a difference.
> 
> Thanks!
> 
> Jeff
> --
> Jeff Schmidt
> 535 Consulting
> j...@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
> 
> 
> 
> 
> 
> 
> 
> 
> 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: Possible to facet across two indices, or document types in single index?

Reply via email to