Yes - I'm using 2 level composite ids and that has caused the imbalance for some shards. Its cars data and the composite ids are of the form year-make!model-and couple of other specifications. e.g. 2013Ford!Edge!123456 - but there are just far too many Ford 2013 or 2011 cars that go and occupy the same shards. This was done so as co-location of these docs is required as well for a few of the search requirements - to avoid it hitting all shards all the time and all queries do have the year and make combinations always specified and its easier to work out the target shard for the query.
Regarding storing the hash against each document and then querying to find out the optimal ranges - could it be done so that Solr maintains incremental counters for each of the hash in the range for the shard - and then the collections Splitshard API could use this internally to propose the optimal shard ranges for the split? -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204124.html Sent from the Solr - User mailing list archive at Nabble.com.