We have similar date and language based collection.
We also ran into similar issues of having huge clusterstate.json file which
took an eternity to load up.

In our case the search cases were language specific so we moved to multiple
solr cluster each having a different zk namespace per language, something
you might look at.
On 27 Jul 2015 20:47, "Olivier" <olivau...@gmail.com> wrote:

> Hi,
>
> I have a SolrCloud cluster with 3 nodes :  3 shards per node and
> replication factor at 3.
> The collections number is around 1000. All the collections use the same
> Zookeeper configuration.
> So when I create each collection, the ZK configuration is pulled from ZK
> and the configuration files are stored in the JVM.
> I thought that if the configuration was the same for each collection, the
> impact on the JVM would be insignifiant because the configuration should be
> loaded only once. But it is not the case, for each collection created, the
> JVM size increases because the configuration is loaded again, am I correct
> ?
>
> If I have a small configuration folder size, I have no problem because the
> folder size is less than 500 KB so if we count 1000 collections x 500 KB,
> the JVM impact is 500 MB.
> But we manage a lot of languages with some dictionaries so the
> configuration folder size is about 6 MB. The JVM impact is very important
> now because it can be more than 6 GB (1000 x 6 MB).
>
> So I would like to have the feeback of people who have a cluster with a
> large number of collections too. Do I have to change some settings to
> handle this case better ? What can I do to optimize this behaviour ?
> For now, we just increase the RAM size per node at 16 GB but we plan to
> increase the collections number.
>
> Thanks,
>
> Olivier
>

Reply via email to