Re: Possible to facet across two indices, or document types in single index?

Jeff Schmidt Sun, 04 Dec 2011 21:55:31 -0800

Well, the JoinQParserPlugin is definitely there.  Turning on debug reveals why 
I get zero results.  Given the URL:


http://localhost:8091/solr/ing-content/select/?qt=partner-tmo&fq=type:node&q={!join+from=conceptId+to=id+fromIndex=partner-tmo}brca1&debugQuery=true&rows=5&fl=id,n_type,n_name

I get:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="debugQuery">true</str>
            <str name="fl">id,n_type,n_name</str>
            <str name="q">{!join from=conceptId to=id 
fromIndex=partner-tmo}brca1</str>
            <str name="qt">partner-tmo</str>
            <str name="fq">type:node</str>
            <str name="rows">5</str>
        </lst>
    </lst>
    <result name="response" numFound="0" start="0"/>
    <lst name="debug">
        <str name="rawquerystring">{!join from=conceptId to=id 
fromIndex=partner-tmo}brca1</str>
        <str name="querystring">{!join from=conceptId to=id 
fromIndex=partner-tmo}brca1</str>
        <str name="parsedquery">JoinQuery({!join from=conceptId to=id 
fromIndex=partner-tmo}n_text:brca)</str>
        <str name="parsedquery_toString">{!join from=conceptId to=id 
fromIndex=partner-tmo}n_text:brca</str>
        <lst name="explain"/>
        <str name="QParser"/>
        <arr name="filter_queries">
            <str>type:node</str>
        </arr>
        <arr name="parsed_filter_queries">
            <str>type:node</str>
        </arr>
        ...
    </lst>
</response>

It looks like despite qt=partner-tmo, the edismax based search hander is being 
bypassed for the default search handler, and is querying against the n_text 
field, which is the defaultSearchField for the ing-conent core.  But, I don't 
want to use the default handler, but rather my configured edismax hander,  and 
any specified filter queries, to determine the document set in the ing-conent 
core, and then join with the partner-tmo core.  [Yes, the edismax handler in 
the ing-content core and the second core are both named partner-tmo].

Can the JoinQParserPlugin work in conjunction with edismax?

Thanks,

Jeff

On Dec 4, 2011, at 4:12 PM, Jeff Schmidt wrote:

> Hello again:
> 
> I'm looking at the newer join functionality 
> (http://wiki.apache.org/solr/Join) to see if that will help me out.  While 
> there are signs it can go cross index/core 
> (https://issues.apache.org/jira/browse/SOLR-2272), I doubt I can specify 
> facet.field params for fields in a couple of different indexes.  But, perhaps 
> a single combined index it might work.
> 
> Anyway, the above Jira item indicates status: resolved, resolution: fixed, 
> and Fix version/s: 4.0.  I've been working with 3.5.0, so I checked out 4.0 
> from svn today:
> 
> [imac:svn/dev/trunk] jas% svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/lucene/dev/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 1210126
> ...
> Last Changed Rev: 1210116
> Last Changed Date: 2011-12-04 07:35:46 -0700 (Sun, 04 Dec 2011)
> 
> Issuing a join query looks like the local params syntax is being ignored and 
> is part of the search terms?  I get zero results, when w/o the join I get 979.
> 
> <response>
>    <lst name="responseHeader">
>        <int name="status">0</int>
>        <int name="QTime">1</int>
>        <lst name="params">
>            <str name="fl">id,n_type,n_name</str>
>            <str name="q">{!join from=conceptId to=id 
> fromIndex=partner-tmo}brca1</str>
>            <str name="qt">partner-tmo</str>
>            <str name="fq">type:node</str>
>            <str name="rows">5</str>
>        </lst>
>    </lst>
>    <result name="response" numFound="0" start="0"/>
> </response>
> 
> I've not yet fully explored this yet, and I'm not all that familiar with the 
> Solr codebase, but is this functionality in 4.x trunk or not? I can see there 
> is the package org.apache.lucene.search.join. Is this the implementation of 
> SOLR-2272?
> 
> I can see the commit was made earlier this year, and then it was reverted and 
> things went off the rails. I don't want to open any old wounds, but does the 
> join exist?  I not, I'll know not to pursue it any further. If so, is there 
> some solrconfig.xml configuration needed to enable it?  I don't see it in the 
> examples.
> 
> Thanks,
> 
> Jeff
> 
> On Dec 1, 2011, at 9:47 PM, Jeff Schmidt wrote:
> 
>> Hello:
>> 
>> I'm trying to relate together two different types of documents.  Currently I 
>> have 'node' documents that reside in one index (core), and 'product mapping' 
>> documents that are in another index.  The product mapping index is used to 
>> map tenant products to nodes. The nodes are canonical content that gets 
>> updated every quarter, where as the product mappings can change at any time.
>> 
>> I put them in two indexes because (1) canonical content changes rarely, and 
>> I don't want product mapping changes to affect it (commit, re-open searchers 
>> etc.), and I would like to support multiple tenants mapping products to the 
>> same canonical content to avoid duplication (a few GB).
>> 
>> This arrange has worked well thus far, but only in the sense that for each 
>> node result returned, I can query the product mapping index to determine the 
>> products mapped to the node.  I combine this information within my 
>> application and return it to the client.  This works okay in that there are 
>> only 5-20 results returned per page (start, rows).  But now I'm being asked 
>> to facet the product catagories (multi-valued field within a product mapping 
>> document) along with other facets defined in the canonical content.
>> 
>> Can this be done with Solr 3.5.0?  I've been looking into sub-queries, 
>> function queries etc.  Also, I've seen various postings indicating that one 
>> needs to denormalize more.  I don't want to add product information as 
>> fields to the canonical content. Not only does that defeat my objective (1) 
>> above, but Solr does not support incremental updates of document fields.
>> 
>> So, one approach is to issue by query to the canonical index and get all of 
>> the document IDs (could be 1000s), and then issue a filter query to the 
>> product mapping index with all of these IDs and have Solr facet the product 
>> categories.  Is that efficient?  I suppose I could use HTTP POST (via SolrJ) 
>> to convey that payload of IDs?  I could then take the facet results of that 
>> query and combine them with the canonical index results and return them to 
>> the client.
>> 
>> That may be do-able, but then let's say the user clicks on a product 
>> category facet value to narrow the node results to only those mapped to 
>> category XYZ. This will not affect the query issued against the canonical 
>> content index.  Instead, I think I'd have to go through the canonical 
>> results and eliminate the nodes that are not associated with product 
>> category XYZ.  Then, if the current page of results is inadequate (rows=10, 
>> but 3 nodes were eliminated), I'd have to go back to the canonical index to 
>> get more rows, eliminate some some again perhaps, get more etc.  That sounds 
>> unappealing and low performing.
>> 
>> Is there a Solr way to do this?  My Packt "Apache Solr 3 Enterprise Search 
>> Server" book (page 34) states regarding separate indices:
>> 
>>      "If you do develop separate schemas and if you need to search across 
>> your indices in one search then you must perform a distributed search, 
>> described in the last chapter. A distributed search is usually a feature 
>> employed for a large corpus but it applies here too."
>> 
>> But in the chapter it goes on to talk about dealing with sharding, 
>> replication etc. to support a large corpus, not necessarily tying together 
>> two different indexes.
>> 
>> Is it possible to accomplish my goal in a less ugly way than I outlined 
>> above?  Since we only have a single tenant to worry about, I could use a 
>> combined index at least for a few months (separate fields per document type, 
>> IDs are unique among then all) if that makes a difference.
>> 
>> Thanks!
>> 
>> Jeff
>> --
>> Jeff Schmidt
>> 535 Consulting
>> j...@535consulting.com
>> http://www.535consulting.com
>> (650) 423-1068
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> --
> Jeff Schmidt
> 535 Consulting
> j...@535consulting.com
> http://www.535consulting.com
> (650) 423-1068
> 
> 
> 
> 
> 
> 
> 
> 
> 



--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068

Re: Possible to facet across two indices, or document types in single index?

Reply via email to