Can you solve this problem by adding all documents into the same collection and performing self joins. You could add a field called rec_type to differentiate between the records.
There are two good reasons for wanting to do this. 1) This allows you to route by the join key and easily co-locate records. 2) There is an optimized self join which is extremely fast that you could take advantage of if you did this. Let me know if this might be an option for you and we can discuss the optimized self join in more detail. Joel Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper <[email protected]> wrote: > After some research, it appears the following approach may help in this > situation and relieve the requirement of collocating indexes for Joins. It > appears one drawback maybe the types of fields supported for the JOIN > field. > > https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join > > Matt > > On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper <[email protected]> wrote: > > > Hi Solr Group, > > > > I am not sure the following is a viable use-case, welcoming input and any > > implementation recommendations. > > > > I would like to perform joins over two sharded collections. Where docs > > are routed to specific shards based on a date range and are the same for > > shards in each collection. > > > > I understand that this means that the replicas from each collection that > > hold data to be joined need to be collated on the same Solr Server. I > > have read solutions that use ADD REPLICA to add a Collection B replica to > > all SolrServers assuming Collection B has only one Shard. For my use > case > > I need Collection B to have multiple shards. > > > > *Collection A Collection B SolrServer * > > Shard1_2020 Shard1_2020 172.33.0.1:8983_solr > > Shard2_2021 Shard2_2021 172.33.0.2:8983_solr > > Shard3_2022 Shard3_2022 172.33.0.3:8983_solr > > > > I think my question comes down to how do I break shards by a date range, > > and do it in a way that both Collections A and B would be defined by the > > same date range? If could reliably break shards by date, and know the > date > > range of the shard, I think I could use ADD REPLICA api to align. > > > > Not sure a compositeId routing approach would work, but thinking an > > implicit id may be hard to manage over time. > > > > Is an approach like this viable, concerned a bit about > > maintenance concerns, other ideas to support this join? > > > > Note: I am considering this within Time series collections... > > > > Matt > > >
