Is there any somewhat convenient way to collate/integrate fields from separate indices during result writing, if the indices use the same unique keys? Basically, some sort of cross-index JOIN?

As a bit of background, I have a rather heavyweight dataset of every US business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours to fully index on a decent box). Given the size and relatively stability of the dataset, I generally only update this monthly. However, I have separate advertising-related datasets that need to be updated either hourly or daily (e.g. today's coupon, click revenue remaining, etc.) . These advertiser feeds reference the same keyspace that I use in the main index, but are otherwise significantly lighter weight. Importing and indexing them discretely only takes a couple minutes. Given that Solr/Lucene doesn't support field updating, without having to drop and re-add an entire document, it doesn't seem practical to integrate this data into the main index (the system would be under a constant state of churn, if we did document re-inserts, and the performance impact would probably be debilitating). It may be nice if this data could participate in filtering (e.g. only show advertisers), but it doesn't need to participate in scoring/ranking.

I'm guessing that someone else has had a similar need, at some point? I can have our front-end query the smaller indices separately, using the keys returned by the primary index, but would prefer to avoid the extra sequential roundtrips. I'm hoping to also avoid a coding solution, if only to avoid the maintenance overhead as we drop in new builds of Solr, but that's also feasible.

Thank you for your insight,
Aaron

Reply via email to