Thanks Joel! I will give this a try. That is quite a performance boost. Matt
On Tue, Jul 13, 2021 at 9:14 AM Joel Bernstein <[email protected]> wrote: > The optimized join was added in Solr 8.8: > https://issues.apache.org/jira/browse/SOLR-15049 > > It kicks in when you use the join qparser plugin in the following scenario: > > 1) Do not specify a fromIndex. This is because the to and from index are > the same. > 2) The to and from fields are the same. > 3) The join method is topLevelDV. > > {!join to=store_id from=store_id method=topLevelDV} > > If you do this with Solr 8.8+ you get the effect of SOLR-15049. It is a > massive performance improvement. In my testing it was 7000 times faster > then the standard join parser plugin for larger joins. > > > > > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Mon, Jul 12, 2021 at 1:34 PM Matt Kuiper <[email protected]> wrote: > > > Hi Joel, > > > > I reviewed a few options with my team, and your recommendation is at the > > top of the list. I believe it will work for our use case. > > > > You mentioned that if this approach worked, you would be willing to share > > more details on an "optimized self join." > > > > I would enjoy hearing more. > > > > Thanks, > > Matt > > > > On Fri, Jul 9, 2021 at 9:36 AM Joel Bernstein <[email protected]> > wrote: > > > > > Block join is another option. If that works for you, from an indexing > > > standpoint, it's the most performant query time join. > > > > > > If block indexing doesn't work for you then the optimized self join is > > > almost as fast. > > > > > > > > > Joel Bernstein > > > http://joelsolr.blogspot.com/ > > > > > > > > > On Fri, Jul 9, 2021 at 11:31 AM Matt Kuiper <[email protected]> > wrote: > > > > > > > Thanks Joel! > > > > > > > > On my list is to investigate Block Joins and Nested Child docs. > > > > > > > > > > > > > > > > > > https://solr.apache.org/guide/8_8/other-parsers.html#block-join-query-parsers > > > > > > > > > > > > > > > > > > https://solr.apache.org/guide/8_8/indexing-nested-documents.html#indexing-nested-documents > > > > > > > > However, it looks like you are not suggesting using nested docs, but > > > > specifying a type field to differentiate between types of docs and > > then a > > > > join field. Not having to build nested docs prior to updates would > be > > an > > > > advantage. And it makes sense that the join field would allow for > > > reliable > > > > routing to appropriate the shard for both doc types. > > > > > > > > I will take a further look and see if this approach will work, and > get > > > back > > > > if more info is needed on the optimized self join. > > > > > > > > Thanks again, > > > > Matt > > > > > > > > > > > > On Fri, Jul 9, 2021 at 7:01 AM Joel Bernstein <[email protected]> > > > wrote: > > > > > > > > > Can you solve this problem by adding all documents into the same > > > > collection > > > > > and performing self joins. You could add a field called rec_type to > > > > > differentiate between the records. > > > > > > > > > > There are two good reasons for wanting to do this. > > > > > > > > > > 1) This allows you to route by the join key and easily co-locate > > > records. > > > > > > > > > > 2) There is an optimized self join which is extremely fast that you > > > could > > > > > take advantage of if you did this. > > > > > > > > > > Let me know if this might be an option for you and we can discuss > the > > > > > optimized self join in more detail. > > > > > > > > > > Joel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Joel Bernstein > > > > > http://joelsolr.blogspot.com/ > > > > > > > > > > > > > > > On Fri, Jul 2, 2021 at 6:28 PM Matt Kuiper <[email protected]> > > wrote: > > > > > > > > > > > After some research, it appears the following approach may help > in > > > this > > > > > > situation and relieve the requirement of collocating indexes for > > > Joins. > > > > > It > > > > > > appears one drawback maybe the types of fields supported for the > > JOIN > > > > > > field. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://solr.apache.org/guide/8_8/other-parsers.html#cross-collection-join > > > > > > > > > > > > Matt > > > > > > > > > > > > On Wed, Jun 30, 2021 at 11:59 AM Matt Kuiper <[email protected] > > > > > > wrote: > > > > > > > > > > > > > Hi Solr Group, > > > > > > > > > > > > > > I am not sure the following is a viable use-case, welcoming > input > > > and > > > > > any > > > > > > > implementation recommendations. > > > > > > > > > > > > > > I would like to perform joins over two sharded collections. > > Where > > > > docs > > > > > > > are routed to specific shards based on a date range and are the > > > same > > > > > for > > > > > > > shards in each collection. > > > > > > > > > > > > > > I understand that this means that the replicas from each > > collection > > > > > that > > > > > > > hold data to be joined need to be collated on the same Solr > > Server. > > > > I > > > > > > > have read solutions that use ADD REPLICA to add a Collection B > > > > replica > > > > > to > > > > > > > all SolrServers assuming Collection B has only one Shard. For > my > > > use > > > > > > case > > > > > > > I need Collection B to have multiple shards. > > > > > > > > > > > > > > *Collection A Collection B > > SolrServer * > > > > > > > Shard1_2020 Shard1_2020 172.33.0.1:8983 > > > _solr > > > > > > > Shard2_2021 Shard2_2021 172.33.0.2:8983 > > > _solr > > > > > > > Shard3_2022 Shard3_2022 172.33.0.3:8983 > > > _solr > > > > > > > > > > > > > > I think my question comes down to how do I break shards by a > date > > > > > range, > > > > > > > and do it in a way that both Collections A and B would be > defined > > > by > > > > > the > > > > > > > same date range? If could reliably break shards by date, and > > know > > > > the > > > > > > date > > > > > > > range of the shard, I think I could use ADD REPLICA api to > align. > > > > > > > > > > > > > > Not sure a compositeId routing approach would work, but > thinking > > an > > > > > > > implicit id may be hard to manage over time. > > > > > > > > > > > > > > Is an approach like this viable, concerned a bit about > > > > > > > maintenance concerns, other ideas to support this join? > > > > > > > > > > > > > > Note: I am considering this within Time series collections... > > > > > > > > > > > > > > Matt > > > > > > > > > > > > > > > > > > > > > > > > > > > >
