Thanks for your reply Eric. In my case, I've 14 languages, out of which 50% of the documents belong to English. German and CHS will probably constitute another 25%. I'm not using copyfield, rather, each language has it's dedicated field such as title_enu, text_enu, title_ger,text_ger, etc. Since I know the language prior to index time, this works for, me.
I've added one more sample key in the example. ENU!12345!www.testurl.com/enu/doc1 ENU!12345!www.testurl.com/enu/doc10 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs/doc3 As you can see, there are 2 documents in english having same topic id (12345). I added topicid as part of the key to make sure that they are residing in the same shard in order to make field collapsing work on topic id. I can perhaps remove the composite key and only have language and url, something like, ENU!www.testurl.com/enu/doc1 But that'll probably not solve the distribution issue. You mentioned "when you take over routing, making sure the distribution is even is now your responsibility." I'm wondering, what's the best practice to make it happen ? I can get away from composite router and manually assign a bunch of language to a dedicated shard, both during index and query time. But I'm not sure keeping a map is an efficient way of dealing with it. -- View this message in context: http://lucene.472066.n3.nabble.com/Uneven-index-distribution-using-composite-router-tp4195569p4195591.html Sent from the Solr - User mailing list archive at Nabble.com.