I think what is happening here is that the sub shard replicas are taking
time to recover. We use a core admin command to wait for the replicas to
become active before the shard states are switched. The timeout value for
that command is just 120 seconds. We should wait for more than that. I'll
open an issue.


On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:

> Seems the issue occurs when the shard has more than one replica.
>
> I unload all replicas of the shard (less 1 to do the split) and the
> SPLITSHARD finished as expected, the parent went to inactive and the
> children active.
>
> If the parent has more than 1 replica, the process apparently is finish,
> the total number of documents of children are the same of the parent, the
> problem is that the parent never goes to inactive state and the children
> are stuck in construction state.
>
> --
> Yago Riveiro
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote:
>
> > I can attach the full log of the process if you want.
> >
> > --
> > Yago Riveiro
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> > On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote:
> >
> > > The error in log are:
> > >
> > > ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: splitshard the collection time
> out:300s
> > > ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException;
> null:org.apache.solr.common.SolrException: splitshard the collection time
> out:300s
> > >
> > >
> > > INFO  - 2013-10-05 22:48:54.083;
> org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection
> Processor: Message id:/overseer/collection-queue-work/qn-0000000138
> complete,
> response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I
> was asked to wait on state active for 192.168.
> 20.105:8983_solr but I still do not see the requested state. I see state:
> recovering live:true},Operation splitshard caused
> exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to create
> subshard replicas or timed out waiting for them to come
> up,exception={msg=SPLTSHARD failed to create subshard replicas or timed out
> waiting for them to come up,rspCode=500}}
> > >
> > >
> > > --
> > > Yago Riveiro
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > >
> > >
> > > On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote:
> > >
> > > > I don't have the log, the rotation log file is configured to only 5
> files with a small size, I will reconfigured to a high value and retry the
> split again.
> > > >
> > > >
> > > > --
> > > > Yago Riveiro
> > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > >
> > > >
> > > > On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar wrote:
> > > >
> > > > > On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro <
> yago.rive...@gmail.com (mailto:yago.rive...@gmail.com)> wrote:
> > > > >
> > > > > > How I can see the logs of the parent?
> > > > > >
> > > > > > They are stored on solr.log?
> > > > >
> > > > > Yes.
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Shalin Shekhar Mangar.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Reply via email to