Erick, I have not changed any config. I have autoaddReplica = true for
individual collection config as well as the overall cluster config. Still,
it does not add a replica when I decommission a node.

Adding a replica is overseer's job. I looked at the logs of the overseer of
the solrCloud but could not find anything there as well.

I am doing some testing using different configs. I would be happy to share
my finding.

One of the things I have observed is: if I use the collection API to create
a replica for that shard, it does not complain about the config which has
been set to ReplicationFactor=1. If replication factor was the issue as
suggested by Shawn, shouldn't it complain?

I would also like to mention that I experience some instance dirs getting
deleted and also found this open bug (
https://issues.apache.org/jira/browse/SOLR-8905)

Thanks!

On Thu, Jan 12, 2017 at 9:50 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Hmmm, have you changed any of the settings for autoAddReplcia? There
> are several parameters that govern how long before a replica would be
> added.
>
> But I suggest you use the Cloudera resources for this question, not
> only did they write this functionality, but Cloudera support is deeply
> embedded in HDFS and I suspect has _by far_ the most experience with
> it.
>
> And that said, anything you find out that would suggest good ways to
> clarify the docs would be most welcome!
>
> Best,
> Erick
>
> On Thu, Jan 12, 2017 at 8:42 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> > On 1/11/2017 7:14 PM, Chetas Joshi wrote:
> >> This is what I understand about how Solr works on HDFS. Please correct
> me
> >> if I am wrong.
> >>
> >> Although solr shard replication Factor = 1, HDFS default replication =
> 3.
> >> When the node goes down, the solr server running on that node goes down
> and
> >> hence the instance (core) representing the replica goes down. The data
> in
> >> on HDFS (distributed across all the datanodes of the hadoop cluster
> with 3X
> >> replication).  This is the reason why I have kept replicationFactor=1.
> >>
> >> As per the link:
> >> https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
> >> One benefit to running Solr in HDFS is the ability to automatically add
> new
> >> replicas when the Overseer notices that a shard has gone down. Because
> the
> >> "gone" index shards are stored in HDFS, a new core will be created and
> the
> >> new core will point to the existing indexes in HDFS.
> >>
> >> This is the expected behavior of Solr overseer which I am not able to
> see.
> >> After a couple of hours a node was assigned to host the shard but the
> >> status of the shard is still "down" and the instance dir is missing on
> that
> >> node for that particular shard_replica.
> >
> > As I said before, I know very little about HDFS, so the following could
> > be wrong, but it makes sense so I'll say it:
> >
> > I would imagine that Solr doesn't know or care what your HDFS
> > replication is ... the only replicas it knows about are the ones that it
> > is managing itself.  The autoAddReplicas feature manages *SolrCloud*
> > replicas, not HDFS replicas.
> >
> > I have seen people say that multiple SolrCloud replicas will take up
> > additional space in HDFS -- they do not point at the same index files.
> > This is because proper Lucene operation requires that it lock an index
> > and prevent any other thread/process from writing to the index at the
> > same time.  When you index, SolrCloud updates all replicas independently
> > -- the only time indexes are replicated is when you add a new replica or
> > a serious problem has occurred and an index needs to be recovered.
> >
> > Thanks,
> > Shawn
> >
>

Reply via email to