Re: Collating results from multiple indexes

Otis Gospodnetic Thu, 11 Feb 2010 14:02:45 -0800

Minor correction re Attivio - their stuff runs on top of Lucene, not Solr.  I 
*think* they are trying to patent this.


 Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Jan Høydahl / Cominvent <jan....@cominvent.com>
> To: solr-user@lucene.apache.org
> Sent: Mon, February 8, 2010 3:33:41 PM
> Subject: Re: Collating results from multiple indexes
> 
> Hi,
> 
> There is no JOIN functionality in Solr. The common solution is either to 
> accept 
> the high volume update churn, or to add client side code to build a "join" 
> layer 
> on top of the two indices. I know that Attivio (www.attivio.com) have built 
> some 
> kind of JOIN functionality on top of Solr in their AIE product, but do not 
> know 
> the details or the actual performance.
> 
> Why not open a JIRA issue, if there is no such already, to request this as a 
> feature?
> 
> --
> Jan Høydahl  - search architect
> Cominvent AS - www.cominvent.com
> 
> On 25. jan. 2010, at 22.01, Aaron McKee wrote:
> 
> > 
> > Is there any somewhat convenient way to collate/integrate fields from 
> > separate 
> indices during result writing, if the indices use the same unique keys? 
> Basically, some sort of cross-index JOIN?
> > 
> > As a bit of background, I have a rather heavyweight dataset of every US 
> business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to 
> fully index on a decent box). Given the size and relatively stability of the 
> dataset, I generally only update this monthly. However, I have separate 
> advertising-related datasets that need to be updated either hourly or daily 
> (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds 
> reference the same keyspace that I use in the main index, but are otherwise 
> significantly lighter weight. Importing and indexing them discretely only 
> takes 
> a couple minutes. Given that Solr/Lucene doesn't support field updating, 
> without 
> having to drop and re-add an entire document, it doesn't seem practical to 
> integrate this data into the main index (the system would be under a constant 
> state of churn, if we did document re-inserts, and the performance impact 
> would 
> probably be debilitating). It may be nice if this data could participate in 
> filtering (e.g. only show advertisers), but it doesn't need to participate in 
> scoring/ranking.
> > 
> > I'm guessing that someone else has had a similar need, at some point?  I 
> > can 
> have our front-end query the smaller indices separately, using the keys 
> returned 
> by the primary index, but would prefer to avoid the extra sequential 
> roundtrips. 
> I'm hoping to also avoid a coding solution, if only to avoid the maintenance 
> overhead as we drop in new builds of Solr, but that's also feasible.
> > 
> > Thank you for your insight,
> > Aaron
> >

Re: Collating results from multiple indexes

Reply via email to