Thanks for your reply Eric.

In my case, I've 14 languages, out of which 50% of the documents belong to
English. German and CHS will probably constitute another 25%. I'm not using
copyfield, rather, each language has it's dedicated field such as title_enu,
text_enu, title_ger,text_ger, etc. Since I know the language prior to index
time, this works for, me. 

I've added one more sample key in the example. 

ENU!12345!www.testurl.com/enu/doc1 
ENU!12345!www.testurl.com/enu/doc10 
GER!12345!www.testurl.com/ger/doc2 
CHS!67890!www.testurl.com/chs/doc3 

As you can see, there are 2 documents in english having same topic id
(12345). I added topicid as part of the key to make sure that they are
residing in the same shard in order to make field collapsing work on topic
id. I can perhaps remove the composite key and only have language and url,
something like, 

ENU!www.testurl.com/enu/doc1

But that'll probably not solve the distribution issue. You mentioned "when
you take over routing, making sure the distribution is even is now your
responsibility." I'm wondering, what's the best practice to make it happen ?
I can get away from composite router and manually assign a bunch of language
to a dedicated shard, both during index and query time. But I'm not sure
keeping a map is an efficient way of dealing with it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Uneven-index-distribution-using-composite-router-tp4195569p4195591.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to