-----------------------------------------------
 Brief overview of the setup:
-----------------------------------------------
5 x SolrCloud (Solr 4.6.1) node instances (separate machines)
The setup is intended to store last 48 hours webapp logs (which are pretty
intense... ~ 3MB/sec)

"logs" collection has 5 shards (one per node instance)
One logline represents one document of "logs" collection
-----------------------------------------------

If I keep storing log documents to this "logs" collection, cores on shards
start getting really big and CPU graphs show that instances spend more and
more time waiting for disk I/O.

So, my idea is to create new collection with each 15 minutes and name it
"logs-201402051400" with shards spread across 5 instances. Document writers
will start writing to the new collection as soon as it is created. At some
time I will get the list of collection like that:

...
logs-201402051400
logs-201402051415
logs-201402051430
logs-201402051445
logs-201402051500
...

Since there will be max 192 collections (~1000 cores) in the SolrCloud at
some certain period of time. It seems that search performance should degrade
drastically. So, I would like to merge collections that are not being
currently written to into one large collection (but still sharded across 5
instances). I have found information how to merge cores, but how can I merge
collections?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-merge-collections-split-across-multiple-shards-tp4115594.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to