innerJoin(intersect(innerJoin(collection1, collection2),
                               innerJoin(collection 3, collection4)),
                collection5)

Let's focus on:

innerJoin(collection 3, collection4))

The first thing to focus on is how fast is the export from collection4. You
can test this with the NullStream with the following construct:

null(search(collection4))

The null stream will eat all the tuples and report back timing information.
This will isolate the performance of the export from collection4.

Once you have a baseline for how fast you can export from a single node,
you can test with parallel export from a single node:

parallel(null(search(collection4)))

Then you can add replicas for collection4 and increase workers.













Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 1, 2017 at 11:51 PM, Susmit Shukla <shukla.sus...@gmail.com>
wrote:

> Hi,
>
> Which version of solr are you on?
> Increasing memory may not be useful as streaming API does not keep stuff in
> memory (except may be hash joins).
> Increasing replicas (not sharding) and pushing the join computation on
> worker solr cluster with #workers > 1 would definitely make things faster.
> Are you limiting your results at some cutoff? if yes, then SOLR-10698
> <https://issues.apache.org/jira/browse/SOLR-10698> can be useful fix. Also
> binary response format for streaming would be faster. (available in 6.5
> probably)
>
>
>
> On Thu, Jun 1, 2017 at 3:04 PM, thiaga rajan <
> ecethiagu2...@yahoo.co.in.invalid> wrote:
>
> > We are working on a proposal and feeling streaming API along with export
> > handler will best fit for our usecases. We are already of having a
> > structure in solr in which we are using graph queries to produce
> > hierarchical structure. Now from the structure we need to join couple of
> > more collections.         We have 5 different collections.
> >           Collection 1- 800 k records.
> > Collection 2- 200k records.                                   Collection
> 3
> > - 7k records.                                       Collection 4 - 6
> > million records.                             Collection 5 - 150 k records
> >                             we are using the below strategy
> >             innerJoin( intersect( innerJoin(collection 1,collection 2),
> > innerJoin(Collection 3, Collection 4)), collection 5).
> >                We are seeing performance is too slow when we start having
> > collection 4. Just with collection 1 2 5 the results are coming in 2
> secs.
> > The moment I have included collection 4 in the query I could see  a
> > performance impact. I believe exporting large results from collection 4
> is
> > causing the issie. Currently I am using single sharded collection with no
> > replica. I thinking if we can increase the memory as first option to
> > increase performance as processing doc values need more memory. Then if
> > that did not worked I can check using parallel stream/ sharding. Kindly
> > advise is there could be anything else I  missing?
> > Sent from Yahoo Mail on Android
>

Reply via email to