Re: indexing - offline

Rallavagu Thu, 20 Oct 2016 11:31:01 -0700

Thanks Evan for quick response.

On 10/20/16 10:19 AM, Tom Evans wrote:

On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu <rallav...@gmail.com> wrote:

Solr 5.4.1 cloud with embedded jetty


Looking for some ideas around offline indexing where an independent node
will be indexed offline (not in the cloud) and added to the cloud to become
leader so other cloud nodes will get replicated. Wonder if this is possible
without interrupting the live service. Thanks.


How we do this, to reindex collection "foo":

1) First, collection "foo" should be an alias to the real collection,
eg "foo_1" aliased to "foo"
2) Have a node "node_i" in the cluster that is used for indexing. It
doesn't hold any shards of any collections

So, a node is part of the cluster but no collections? How can we add anode to cloud without active participation?

3) Use collections API to create collection "foo_2", with however many
shards required, but all placed on "node_i"
4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
5) Use collections API to expand "foo_2" to all the nodes/replicas
that it should be on

Could you please point me to documentation on how to do this? I amreferring to this dochttps://cwiki.apache.org/confluence/display/solr/Collections+API. But,it has many options and honestly not sure which one would be useful inthis case.


Thanks

6) Remove "foo_2" from "node_i"
7) Verify contents of "foo_2" are correct
8) Use collections API to change alias for "foo" to "foo_2"
9) Remove "foo_1" collection once happy

This avoids indexing overwhelming the performance of the cluster (or
any nodes in the cluster that receive queries), and can be performed
with zero downtime or config changes on the clients.

Cheers

Tom

Re: indexing - offline

Reply via email to