Trying to understand how solr is co-locating documents with a prefix using 
composite id router scheme.

Created a collection with 2 shards with composite id router. Published 3 docs , 
2 docs with  prefix  "tenant1!" in the docId field and 1 doc with prefix 
"tenant2!" in the docId.
Queried the collections with shards=shard1 and shards=shard2 parameter.

Saw that 3 documents are placed in shard1 and on shard2 there are no documents. 
 Is there a certain threshold number of docs  to be present in shard1 ,before 
shard2 is considered.

According to https://sematext.com/blog/solrcloud-large-tenants-and-routing/ , 
documents with first level prefix will be routed to one shard.  Is it a 
possibility to send documents of one tenant to occupy one shard in a collection 
in composite id router scheme.


Thanks,
Rajeswari

On 4/7/21, 2:07 PM, "Natarajan, Rajeswari" <[email protected]> wrote:

    Thanks much for your reply.
    Thanks,
    Rajeswari

    On 4/7/21, 1:16 PM, "Shawn Heisey" <[email protected]> wrote:

        On 4/7/2021 1:41 PM, Natarajan, Rajeswari wrote:
        > If there is any way to get the size of the index of tenant in a 
collection where multiple tenants co-exist with composite id router scheme ,let 
me know
        > We need to somehow track the tenant's index size to see if it grows 
too big and document count is not proportional to index size in our case.

        There isn't any way to do that.  The way that Lucene's indexes are 
        designed, obtaining that information is currently impossible, and it 
        would likely take a VERY large amount of development effort to make it 
        possible.  I would guess that even if it were possible, obtaining that 
        information would be very expensive in terms of system resources and 
time.

        The best you can do with current technology is estimate the size based 
        on document count compared to the whole index.  But if each tenant has 
        very different kinds of data in the index, that method would probably 
        give you inaccurate information.

        One thing you could do to have each one be its own collection is set up 
        multiple cloud installs, which can share one zookeeper ensemble by 
using 
        different chroot values for each one, and only put a few hundred 
        collections in each cloud.  This would probably require a lot of 
        additional hardware, and because of Lucene's economies of scale that 
        Walter was talking about, multiple collections WILL be larger on disk 
        than multiple tenants in one collection.

        Thanks,
        Shawn


Reply via email to