> You can certainly replicate the joined collection to every shard. It
> must fit in one shard and a replica of that shard must be co-located
> with every replica of the “to” collection.

Yes, I found this in the documentation, with a clear example just after
this mail. I will test it today. I also read your blog about join
performances[1] and I suspect the performance impact of joins will be
huge because the joined collection is about 10M documents (only two
fields, unique id and an array of longs and a filter applied to the
array, join key is 10M unique IDs).

> Have you looked at streaming and “streaming expressions"? It does not
> have the same problem, although it does have its own limitations.

I never tested them, and I am not very confortable yet in how to test
them. Is it possible to mix query parsers and streaming expression in
the client call via http parameters - or is streaming expression apply
programmatically only ?

[1] https://lucidworks.com/post/solr-and-joins/

On Tue, Oct 15, 2019 at 07:12:25PM -0400, Erick Erickson wrote:
> You can certainly replicate the joined collection to every shard. It must fit 
> in one shard and a replica of that shard must be co-located with every 
> replica of the “to” collection.
> 
> Have you looked at streaming and “streaming expressions"? It does not have 
> the same problem, although it does have its own limitations.
> 
> Best,
> Erick
> 
> > On Oct 15, 2019, at 6:58 PM, Nicolas Paris <nicolas.pa...@riseup.net> wrote:
> > 
> > Hi
> > 
> > I have several large collections that cannot fit in a standalone solr
> > instance. They are split over multiple shards in solr-cloud mode.
> > 
> > Those collections are supposed to be joined to an other collection to
> > retrieve subset. Because I am using distributed collections, I am not
> > able to use the solr join feature.
> > 
> > For this reason, I denormalize the information by adding the joined
> > collection within every collections. Naturally, when I want to update
> > the joined collection, I have to update every one of the distributed
> > collections.
> > 
> > In standalone mode, I only would have to update the joined collection.
> > 
> > I wonder if there is a way to overcome this limitation. For example, by
> > replicating the joined collection to every shard - or other method I am
> > ignoring.
> > 
> > Any thought ? 
> > -- 
> > nicolas
> 

-- 
nicolas

Reply via email to