Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I *think* they are trying to patent this.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Jan Høydahl / Cominvent <jan....@cominvent.com> > To: solr-user@lucene.apache.org > Sent: Mon, February 8, 2010 3:33:41 PM > Subject: Re: Collating results from multiple indexes > > Hi, > > There is no JOIN functionality in Solr. The common solution is either to > accept > the high volume update churn, or to add client side code to build a "join" > layer > on top of the two indices. I know that Attivio (www.attivio.com) have built > some > kind of JOIN functionality on top of Solr in their AIE product, but do not > know > the details or the actual performance. > > Why not open a JIRA issue, if there is no such already, to request this as a > feature? > > -- > Jan Høydahl - search architect > Cominvent AS - www.cominvent.com > > On 25. jan. 2010, at 22.01, Aaron McKee wrote: > > > > > Is there any somewhat convenient way to collate/integrate fields from > > separate > indices during result writing, if the indices use the same unique keys? > Basically, some sort of cross-index JOIN? > > > > As a bit of background, I have a rather heavyweight dataset of every US > business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to > fully index on a decent box). Given the size and relatively stability of the > dataset, I generally only update this monthly. However, I have separate > advertising-related datasets that need to be updated either hourly or daily > (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds > reference the same keyspace that I use in the main index, but are otherwise > significantly lighter weight. Importing and indexing them discretely only > takes > a couple minutes. Given that Solr/Lucene doesn't support field updating, > without > having to drop and re-add an entire document, it doesn't seem practical to > integrate this data into the main index (the system would be under a constant > state of churn, if we did document re-inserts, and the performance impact > would > probably be debilitating). It may be nice if this data could participate in > filtering (e.g. only show advertisers), but it doesn't need to participate in > scoring/ranking. > > > > I'm guessing that someone else has had a similar need, at some point? I > > can > have our front-end query the smaller indices separately, using the keys > returned > by the primary index, but would prefer to avoid the extra sequential > roundtrips. > I'm hoping to also avoid a coding solution, if only to avoid the maintenance > overhead as we drop in new builds of Solr, but that's also feasible. > > > > Thank you for your insight, > > Aaron > >