Re: List all Collections together with number of records

Erick Erickson Sat, 06 Jun 2015 21:24:25 -0700

bq: Yup this information will need to be collected each time the user search
for a query, as we want to show the number of records that matches the
search query in each of the collections.


You're looking at something akin to "federated search". About all you can
do is send out parallel queries to each collection.

This is an "interesting" requirement, and I really question whether it's a wise
thing to insist on. I'd really think about going back to the design.
For instance,
could you consolidate all these collections into a single one, with perhaps
a collection_id? Then the problem is relatively simple, use field collapsing
(aka "grouping").

Best,
Erick

On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo
<edwinye...@gmail.com> wrote:
> Yup this information will need to be collected each time the user search
> for a query, as we want to show the number of records that matches the
> search query in each of the collections.
>
> Currently I only have 6 collections, but it could increase to hundreds of
> collections in the future. So I'm worried that it could slow down the
> system a lot if we have to pass hundreds of queries for each search request.
>
> Regards,
> Edwin
>
>
> On 5 June 2015 at 21:00, Upayavira <u...@odoko.co.uk> wrote:
>
>> I'm not so sure this is as bad as it sounds. When your collection is
>> sharded, no single node knows about the documents in other shards/nodes,
>> so to find the total number, a query will need to go to every node.
>>
>> Trying to work out something to do a single request to every node,
>> combine their collection statistics and aggregate them into a single
>> result sounds very complicated, and likely overkill.
>>
>> Are you needing to collect this information often? Do you have a lot of
>> collections?
>>
>> Upayavira
>>
>>
>> On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
>> > I'm trying to write a SolrJ program in Java to read and consolidate all
>> > the
>> > information into a JSON file, The client will just need to call this
>> > SolrJ
>> > program and read this JSON file to get the details. But the problem is we
>> > are still querying the Solr once for each collection, just that this time
>> > it is done in the SolrJ program in a for-loop, while previously it's done
>> > on the client side. Not sure will this lead to performance improvement?
>> >
>> > For your suggestion on spawning a bunch of threads, does it mean the same
>> > thing as I did?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 5 June 2015 at 12:03, Erick Erickson <erickerick...@gmail.com> wrote:
>> >
>> > > Have you considered spawning a bunch of threads, one per collection
>> > > and having them all run in parallel?
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
>> > > <edwinye...@gmail.com> wrote:
>> > > > The reason we wanted to do a single call is to improve on the
>> > > performance,
>> > > > as our application requires to list the total number of records in
>> each
>> > > of
>> > > > the collections, and the number of records that matches the query
>> each of
>> > > > the collections.
>> > > >
>> > > > Currently we are querying each collection one by one to retrieve the
>> > > > numFound value and display them, but this can slow down the system
>> > > > significantly when the number of collection grows. So we are
>> thinking of
>> > > > ways to improve the speed in this area.
>> > > >
>> > > > Any other methods which you can suggest that we can do to overcome
>> this
>> > > > speed problem?
>> > > >
>> > > > Regards,
>> > > > Edwin
>> > > > On 5 Jun 2015 00:16, "Erick Erickson" <erickerick...@gmail.com>
>> wrote:
>> > > >
>> > > >> Not in a single call that I know of. These are really orthogonal
>> > > >> concepts. Getting the cluster status merely involves reading the
>> > > >> Zookeeper clusterstate whereas getting the total number of docs for
>> > > >> each would involve querying each collection, i.e. going to the Solr
>> > > >> nodes themselves. I'd guess it's unlikely to be combined.
>> > > >>
>> > > >> Best,
>> > > >> Erick
>> > > >>
>> > > >> On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin Yeo
>> > > >> <edwinye...@gmail.com> wrote:
>> > > >> > Hi,
>> > > >> >
>> > > >> > Would like to check, are we able to use the Collection API or any
>> > > other
>> > > >> > method to list all the collections in the cluster together with
>> the
>> > > >> number
>> > > >> > of records in each of the collections in one output?
>> > > >> >
>> > > >> > Currently, I only know of the List Collections
>> > > >> > /admin/collections?action=LIST. However, this only list the names
>> of
>> > > the
>> > > >> > collections that are in the cluster, but not the number of
>> records.
>> > > >> >
>> > > >> > Is there a way to show the number of records in each of the
>> > > collections
>> > > >> as
>> > > >> > well?
>> > > >> >
>> > > >> > Regards,
>> > > >> > Edwin
>> > > >>
>> > >
>>

Re: List all Collections together with number of records

Reply via email to