On 9/29/2011 7:22 AM, Darren Govoni wrote: > That was kinda my point. The "new" cloud implementation > is not about replication, nor should it be. But rather about > horizontal scalability where "nodes" manage different parts > of a unified index.
It;s about many things. You stated one, but there are goals, one of them being tolerance to node outages. In a cloud, when one of your many nodes fail, you don't want to stop querying and indexing. For this to happen, you need to maintain redundant copies of the same pieces of the index, hence you need to replicate. > One of the design goals of the "new" cloud > implementation is for this to happen more or less automatically. True, but there is a big gap between goals and current state. Right now, there is distributed search, but not distributed indexing or auto-sharding, or auto-replication. So if you want to use the SolrCloud now (as many of us do), you need do a number of things yourself, even if they might be done by SolrCloud automatically in the future. > To me that means one does not have to manually distributed > documents or enforce replication as Yurly suggests. > Replication is different to me than what was being asked. > And perhaps I misunderstood the original question. > > Yurly's response introduced the term "core" where the original > person was referring to "nodes". For all I know, those are two > different things in the new cloud design terminology (I believe they are). > > I guess understanding "cores" vs. "nodes" vs "shards" is helpful. :) Shard is a slice of index. Index is managed/stored in a core. Nodes are Solr instances, usually physical machines. Each node can host multiple shards, and each shard can consist of multiple cores. However, all cores within the same shard must have the same content. This is where the OP ran into the problem. The OP had 1 shard, consisting of two cores on two nodes. Since there is no distributed indexing yet, all documents were indexed into a single core. However, there is distributed search, therefore queries were sent randomly to different cores of the same shard. Since one core in the shard had documents and the other didn't, the query result was random. To solve this problem, the OP must make sure all cores within the same shard (be they on the same node or not) have the same content. This can currently be achieved by: a) setting up replication between cores. you index into one core and the other core replicates the content b) indexing into both cores Hope this clarifies.