right, when you take over routing, making sure the distribution is
even is now your responsibility.

Your assumption is that the amount of _text_ in each doc is roughly
the same between your three languages, have you verified this? And are
you doing anything like copyFields that are kicking in on one shard
but not the others (e.g. if you have text_en fields you might be
copying that to text_en_all but not doing so with text_ger to
text_ger_all). that's totally a shot in the dark though.

Best,
Erick

On Thu, Mar 26, 2015 at 10:26 AM, Shamik Bandopadhyay <sham...@gmail.com> wrote:
> Hi,
>
>    I'm using a three level composite router in a solr cloud environment,
> primarily for multi-tenant and field collapsing. The format is as follows.
>
> *language!topic!url*.
>
> An example would be :
>
> ENU!12345!www.testurl.com/enu/doc1
> GER!12345!www.testurl.com/ger/doc2
> CHS!67890!www.testurl.com/chs/doc3
>
> The Solr Cloud cluster contains 2 shard, each having 3 replicas. After
> indexing around 10 million documents, I'm observing that the index size in
> shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is
> getting indexed in shard 1. Since 60% of the document is english, I expect
> the index size to be higher on one shard, but the difference seem little
> too high.
>
> The idea is to make sure that all ENU!12345 documents are routed to one
> shard so that distributed field collapsing works. Is there something I can
> do differently here to make a better distribution ?
>
> Any pointers will be appreciated.
>
> Regards,
> Shamik

Reply via email to