Yes, thanks Shawn.  I know I can use collections HTTP API to set number of 
shards, but the problem with that is it is not easily scriptable so that the 
entire cluster can be setup in automated fashion - the script(s) will need to 
wait until the SOLR nodes are up and running before using the collection API.  
The information I want is: Is there some "configuration" way to set numShards 
(such as in solr.xml, etc. - or by sending some data to zookeeper API)?  I am 
guessing the answer is still no.

Thanks.

________________________________________
From: Shawn Heisey [s...@elyograg.org]
Sent: Tuesday, July 16, 2013 6:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Where to specify numShards when startup up a cloud setup

On 7/16/2013 3:36 PM, Robert Stewart wrote:
> I want to script the creation of N solr cloud instances (on ec2).
>
> But its not clear to me where I would specify numShards setting.
>  From documentation, I see you can specify on the "first node" you start up, 
> OR alternatively, use the "collections" API to create a new collection - but 
> in that case you need first at least one running SOLR instance.  I want to 
> push all solr instances with similar configuration onto N instances and just 
> run them with some number of shards pre-set somehow.  Where can I put 
> numShards configuration setting?
>
> What I want to do:
>
> 1) push solr configuration to zookeeper ensemble using zkCli command-line 
> tool.
> 2) create N instances of SOLR running on Ec2, pointing to the same zookeeper
> 3) start all SOLR instances which will become a cloud setup with M shards 
> (where M<N), and N-M replicas.

A minimal redundant SolrCloud cluster consists of two larger machines
that run Solr and zookeeper, plus a third smaller machine that runs just
zookeeper.  This is just the minimum requirement, you can use additional
and more powerful servers.

The general way that you should set up a brand new SolrCloud.  If anyone
spots a problem with this, please don't hesitate to mention it:

1) Set up three hosts running standalone zookeeper, configured as a
fully redundant ensemble.  This is outside the scope of Solr
documentation, please consult the zookeeper site:

http://zookeeper.apache.org

2) Construct a zkHost parameter for your ZK ensemble.  An example is
below using the default zookeeper port of 2181.  You'd need to use the
proper port numbers, names, etc.  The /chroot part is optional, but
highly recommended.  Use a name that has meaning for your SolrCloud
cluster rather than chroot:

-DzkHost=server1:2181,server2:2181,server3:2181/chroot

By using the /chroot syntax, you can run more than one SolrCloud cluster
on your zookeeper ensemble.  Just use a different value for each cluster.

3) Start Solr with the same zkHost parameter on every Solr host,
referring to the three zookeeper hosts already set up.  You can use the
same hosts for Solr as you did for zookeeper.

4) Use the zkcli script in example/cloud-scripts to upload a
configuration set to zookeeper using the "upconfig" command.  If you
aren't using the Solr example or a custom install based on the example,
then you'll need to examine the script to figure out how to run the java
command manually and have it find the solr and zookeeper jars.

5) Use the Collections API to create a collection, referencing the
uploaded config set and including additional parameters like numShards.
  If you have four Solr hosts, the following API call would work perfectly:

http://server:port/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=2&collection.configName=mycfg

Thanks,
Shawn

Reply via email to