[Solr Wiki] Trivial Update of "SolrCloud" by YonikSeele y

Apache Wiki Mon, 01 Feb 2010 15:18:19 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The "SolrCloud" page has been changed by YonikSeeley.
The comment on this change is: start simple replicated example.
http://wiki.apache.org/solr/SolrCloud?action=diff&rev1=21&rev2=22

--------------------------------------------------

  Solr embeds and uses Zookeeper as a repository for cluster configuration and 
coordination - think of it as a distributed filesystem.
  
  Since we'll need two solr servers for this example, simply make a copy of the 
example directory for the second server.
+ 
  {{{
  cp -r example example2
  }}}
- 
  === Simple two shard cluster ===
+ This example simply creates a cluster consisting of two solr servers 
representing two different shards of a collection.
+ 
+ Since we'll need two solr servers for this example, simply make a copy of the 
example directory for the second server.
+ 
+ {{{
+ cp -r example example2
+ }}}
  This command starts up a Solr server and bootstraps a new solr cluster.
+ 
  {{{
  cd example
  java -Dbootstrap_confname=myconf -Dbootstrap_confdir=./solr/conf -DzkRun -jar 
start.jar
  }}}
- 
   * {{{-DzkRun}}} tells solr to run a single standalone zookeeper server as 
part of this Solr server.
   * {{{-Dbootstrap_confname=myconf}}} tells this solr node to use the "myconf" 
configuration stored within zookeeper.
   * {{{-Dbootstrap_confdir=./solr/conf}}} since "myconf" does not actually 
exist yet, this parameter causes the local configuration directory 
{{{./solr/conf}}} to be uploaded to zookeeper as the "myconf" config.
@@ -37, +44 @@

  
  You can see from the zookeeper browser that the Solr configuration files were 
uploaded under "myconf", and that a new document collection called 
"collection1" was created.  Under collection1 is a list of shards, the pieces 
that make up the complete collection.
  
- Now we want to start up our second server, assigning it a different shard, or 
piece of the collection.
- Simply change the shardId parameter for the appropriate solr core in solr.xml:
+ Now we want to start up our second server, assigning it a different shard, or 
piece of the collection. Simply change the shardId parameter for the 
appropriate solr core in solr.xml:
+ 
  {{{
  cd example2
  perl -pi -e 's/shard1/shard2/g' solr/solr.xml
  #note: if you don't have perl installed, you can simply hand edit solr.xml, 
changing shard1 to shard2
  }}}
- 
  Then start the second server, pointing it at the cluster:
+ 
  {{{
  java -Djetty.port=7574 -DhostPort=7574 -DzkHost=localhost:9983 -jar start.jar
  }}}
- 
   * {{{-Djetty.port=7574}}}  is just one way to tell the Jetty servlet 
container to use a different port.
   * {{{-DhostPort=7574}}} tells Solr what port the servlet container is 
running on.
   * {{{-DzkHost=localhost:9983}}} points to the Zookeeper ensemble containing 
the cluster state.  In this example we're running a single Zookeeper server 
embedded in the first Solr server.  By default, an embedded Zookeeper server 
runs at the Solr port plus 1000, so 9983.
@@ -57, +63 @@

  If you refresh the zookeeper browser, you should now see both shard1 and 
shard2 in collection1.
  
  Next, index some documents to each server:
+ 
  {{{
  cd exampledocs
  java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar 
ipod_video.xml
  java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar 
monitor.xml
  }}}
- 
  And now, a request to either server with "distrib=true" results in a 
distributed search that covers the entire collection:
  
  http://localhost:8983/solr/collection1/select?distrib=true&q=*:*
  
  If at any point you wish to start over fresh or experiment with different 
configurations, you can delete all of the cloud state contained within 
zookeeper by simply deleting the solr/zoo_data directory after shutting down 
the servers.
+ 
+ === Simple two shard cluster with shard replicas ===
+ This example will simply build off of the previous example by creating 
another copy of shard1 and shard2.  Extra shard copies can be used for high 
availability and fault tolerance, or simply for increasing the query capacity 
of the cluster.
+ 
+ First, run through the previous example so we already have two shards and 
some documents indexed into each.  Then simply make a copy of those two servers:
+ 
+ {{{
+ cp -r example exampleB
+ cp -r example2 example2B
+ }}}
+ Then start the two new servers on different ports, each in its own window:
+ 
+ {{{
+ cd exampleB
+ java -Djetty.port=8900 -DhostPort=8900 -DzkHost=localhost:9983 -jar start.jar
+ }}}
+ {{{
+ cd example2B
+ java -Djetty.port=7500 -DhostPort=7500 -DzkHost=localhost:9983 -jar start.jar
+ }}}
+ Refresh the zookeeper browser page 
http://localhost:8983/solr/admin/zookeeper.jsp and verify that 4 solr nodes are 
up, and that each shard is present at 2 nodes.
+ 
+ Now send a query to any of the servers to query the cluster:
+ 
+ http://localhost:7500/solr/collection1/select?distrib=true&q=*:*
+ 
+ Send this query multiple times and observe the logs from the solr servers.  
From your web browser, you may need to hold down CTRL while clicking on the 
browser refresh button to bypass the HTTP caching in your browser.  You should 
be able to observe Solr load balancing the requests across shard replicas, 
using different servers to satisfy each request.  There will be a log statement 
for the top-level request in the server the browser sends the request to, and 
then a log statement for each sub-request that are merged to produce the 
complete response. 
+ 
  
  == ZooKeeper ==
  Multiple Zookeeper servers running together for fault tolerance and high 
availability is called an ensemble.  For production, it's recommended that you 
run an external zookeeper ensemble rather than having Solr run embedded servers.

[Solr Wiki] Trivial Update of "SolrCloud" by YonikSeele y

Reply via email to