Thank you! I'm mainly concerned about facet performance. When we have indexing turned on, our facet performance suffers significantly.
I will add replicas and measure the performance change.

-Joe Obernberger

On 2/25/2015 4:31 PM, Erick Erickson wrote:
bq: Is adding replicas going to increase search performance?

Absolutely, assuming you've maxed out Solr. You can scale the SOLR
query/second rate nearly linearly by adding replicas regardless of
whether it's over HDFS or not.

Having multiple replicas per shard _also_ increases fault tolerance,
so you get both. Even with HDFS, though, a single replica (just a
leader) per shard means that you don't have any redundancy if the
motherboard on that server dies even though HDFS has multiple copies
of the _data_.


On Wed, Feb 25, 2015 at 12:01 PM, Joseph Obernberger
<> wrote:
I am also confused on this.  Is adding replicas going to increase search
performance?  I'm not sure I see the point of any replicas when using HDFS.
Is there one?
Thank you!


On 2/25/2015 10:57 AM, Erick Erickson wrote:
bq: And the data sync between leader/replica is always a problem

Not quite sure what you mean by this. There shouldn't need to be
any synching in the sense that the index gets replicated, the
incoming documents should be sent to each node (and indexed
to HDFS) as they come in.

bq: There is duplicate index computing on Replilca side.

Yes, that's the design of SolrCloud, explicitly to provide data safety.
If you instead rely on the leader to index and somehow pull that
indexed form to the replica, then you will lose data if the leader
goes down before sending the indexed form.

bq: My thought is that the leader and the replica all bind to the same
index directory.

This is unsafe. They would both then try to _write_ to the same
index, which can easily corrupt indexes and/or all but the first
one to access the index would be locked out.

All that said, the HDFS triple-redundancy compounded with the
Solr leaders/replicas redundancy means a bunch of extra
storage. You can turn the HDFS replication down to 1, but that has
other implications.


On Tue, Feb 24, 2015 at 11:12 PM, longsan <> wrote:
We used HDFS as our Solr index storage and we really have a heavy update
load. We had met much problems with current leader/replica solution.
is duplicate index computing on Replilca side. And the data sync between
leader/replica is always a problem.

As HDFS already provides data replication on data layer, could Solr
just service layer replication?

My thought is that the leader and the replica all bind to the same data
index directory. And the leader will build up index for new request, the
replica will just keep update the index version with the leader(such as a
soft commit periodically? ). If the leader lost then the replica will
the duty immediately.

Thanks for any suggestion of this idea.

View this message in context:
Sent from the Solr - User mailing list archive at

Reply via email to