Hi, I'm using a three level composite router in a solr cloud environment, primarily for multi-tenant and field collapsing. The format is as follows.
*language!topic!url*. An example would be : ENU!12345!www.testurl.com/enu/doc1 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs/doc3 The Solr Cloud cluster contains 2 shard, each having 3 replicas. After indexing around 10 million documents, I'm observing that the index size in shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is getting indexed in shard 1. Since 60% of the document is english, I expect the index size to be higher on one shard, but the difference seem little too high. The idea is to make sure that all ENU!12345 documents are routed to one shard so that distributed field collapsing works. Is there something I can do differently here to make a better distribution ? Any pointers will be appreciated. Regards, Shamik