Re: SolrCloud: Understanding Replication

Erick Erickson Fri, 30 May 2014 09:25:41 -0700

Let's back up a bit here. Why are you copying your indexes around?
SolrCloud does all this for you. I suspect you've somehow made a mis-step.

So here's what I'd do by preference; Just set up a new collection and
re-index. Make sure all of the nodes are up and then just go ahead and
index to any of them. If you're using SolrJ, CloudSolrServer will be a bit
more efficient than sending the docs to random nodes, but that's not
necessary.

If that isn't feasible, set up a _one_ node "cloud" and get that running
and showing as up with your current index. Then use the Collections
"ADDREPLICA" command to bring up the other three nodes. All the index
synching should then "just happen".

You're also confusing master/slave replication with SolrCloud. In the
normal state, there is no use of the older-style replication, except when a
downed or new node comes online. If "peer synch" isn't possible, then old
style replication happens, but that should be the only time it's used.
Outside those edge cases, updates go to all nodes when sent to any node,
otherwise NRT wouldn't work.

HTH,
Erick

On Fri, May 30, 2014 at 7:00 AM, Marc Campeau <cam...@gmail.com> wrote:

> Hi, forgot to mention that I'm migrating the index from Solr 4.5.1 to
> 4.8.1.
>
> Thanks,
>
> Marc Campeau
>
>
> 2014-05-30 9:54 GMT-04:00 Marc Campeau <cam...@gmail.com>:
>
> > Hi,
> >
> > I currently have a standalone SOLR 4.5.1deployment on an EC2 instance
> with
> > a single collection and core containing an index that's roughly 10G. I've
> > used this as a proof of concept, prototype and staging during development
> > phases and I'm about to release to production.
> >
> > For this release, I've setup 4 EC2 instances with 3 servers in Zookeeper
> > ensemble and the 4 servers running SolrCloud. My intention is to have my
> > current collection on a single Shard replicated 4 times based on the high
> > availability requirements. For that,  I'm using an ELB as load balancer
> to
> > spread the query load to all 4 instances. For this, I've rsync'ed my
> > current 10G Collection to all 4 SOLR instances in my SolrCloud and
> started
> > them all up. They all come up and do the elections and what nots and all
> > are queryable, which is great. The idea being to load they current index
> as
> > it is and then start updating it instead of reindexing it all from
> scratch.
> >
> > BUT...
> >
> > 1) Using zkCLI, I can see that clusterstate shows all instances as down
> > and this is illustrated on the Solr Admin interface by showing all 4
> > instances using the down color. Is that normal? How can I change that?
> How
> > come if all 4 instances answer queries just fine?
> >
> > 2) It doesn't seem like the instances are replicating... aka if I add a
> > document to the collection it doesn't get replicated to the other
> > instances. Why is that? What should I look for in SOLR logs that would
> tell
> > me that replication is happening? I clearly see in there the
> "/admin/ping"
> > requests made by the load balancer doing health checks and requests made
> to
> > the admin interface but can never find requests made to "/replicate" that
> > would trigger the replication handler.
> >
> > There's obviously something I've done wrong put I can't put my finger on
> > it. I would appreciate your insight on my situation.
> >
> > Thanks,
> >
> > Marc Campeau
> >
> >
>

Re: SolrCloud: Understanding Replication

Reply via email to