Mark Miller <markrmil...@gmail.com> schrieb am 12.06.2012 19:19:01: > > > On Jun 12, 2012, at 3:39 AM, lenz...@gfi.ihk.de wrote: > > > Hello, > > > > we tested SolrCloud in a setup with one collection, two shards and one
> > replica per shard and it works quite fine with some example data. > > Now, we plan to set up our own collection and determine in how many shards > > we should devide it. > > We can estimate quite exactly the size of the collection, but we don't > > know, what the best approach for sharding is, > > even if we know the size and the amount of queries and updates. > > Is there any documentation or a kind of design guidelines for sharding a > > collection in SolrCloud? > > > > > > Thanks & regards, > > Norman Lenzner > > > It's hard to tell - I think you want to start with an idea of how > many docs you can fit on a single node. This can vary wildly > depending on many factors. Generally you have to do some testing > with your particular config and data. You can search the mailing > lists and perhaps dig up a little info, but there is really no > replacement for running some tests with real data. > > Then you have to plan in your growth rate - resharding is naturally > a relatively expensive operation. Once you have an idea of how many > docs per machine you think seems comfortable, figure out how > machines you need given your estimated doc growth rate and perhaps > some padding. You might not get it right, but if you expect the > possibility of a lot of growth, erring on the more shards side is > obviously better. > > - Mark Miller > lucidimagination.com > Hello and thanks for your reply, We will run some tests to determine the size of our collection, but I think, there won't be the need of a second shard at all. The problem is not the size or the growth of the docs, but there will be a quite high update frequency. So, if we have many bulk updates, is it reasonable to distribute the update load on multiple shards? Thanks & regards, Norman Lenzner