----------------------------------------------- Brief overview of the setup: ----------------------------------------------- 5 x SolrCloud (Solr 4.6.1) node instances (separate machines) The setup is intended to store last 48 hours webapp logs (which are pretty intense... ~ 3MB/sec)
"logs" collection has 5 shards (one per node instance) One logline represents one document of "logs" collection ----------------------------------------------- If I keep storing log documents to this "logs" collection, cores on shards start getting really big and CPU graphs show that instances spend more and more time waiting for disk I/O. So, my idea is to create new collection with each 15 minutes and name it "logs-201402051400" with shards spread across 5 instances. Document writers will start writing to the new collection as soon as it is created. At some time I will get the list of collection like that: ... logs-201402051400 logs-201402051415 logs-201402051430 logs-201402051445 logs-201402051500 ... Since there will be max 192 collections (~1000 cores) in the SolrCloud at some certain period of time. It seems that search performance should degrade drastically. So, I would like to merge collections that are not being currently written to into one large collection (but still sharded across 5 instances). I have found information how to merge cores, but how can I merge collections? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-merge-collections-split-across-multiple-shards-tp4115594.html Sent from the Solr - User mailing list archive at Nabble.com.