Ashwin: First, if at all possible I would simply set up my new SolrCloud structure (2 shards, a leader and follower each) and re-index the entire corpus. 24M docs isn't really very many, and you'll have to have this capability sometime since somone, somewhere will want to change the schema in ways that require it.
But to answer your questions: 1: Certainly. There's the SPLITSHARD command, see: https://cwiki.apache.org/confluence/display/solr/Collections+API. That said, Solr 4.4 used a relatively early version of SPLITSHARD and there have been many improvements so make sure and back up first. 2: Not quite sure how long it takes, but I wouldn't expect it to take hours. A lot depends on what the docs are like. 3: Yes, sending a query (or update for that matter) to any node in the cluster will "do the right thing". In a production environment, and assuming you're not using SolrJ, I'd put a load balancer in front of the cluster for queries. If you _are_ querying through SolrJ from the application, you only need to use the CloudSolrServer class as it includes a software load balancer by default. Otherwise, if you hard-code a single machine that machine becomes a single point of failure. Best, Erick On Wed, Apr 1, 2015 at 4:55 AM, Ashwin Kumar <ashwins...@outlook.de> wrote: > Hello Solr Community, > > Greetings ! This is my first post to this group. > > I am very new to solr, so please do not mind if some of my questions below > sound dumb :) > > Let me explain my present setup: > > Solr version : Solr_4.4.0 > Zookeeper version: zookeeper-3.4.5 > ----------------------------- > > Present Setup > Unix_box_1 > One Solr instance (Collection 1 : contains around 24 million indexed > documents) running on port 8983 > > -------------------------------------------- > > Target setup > > Now as the number of users are going to increase and also we are looking for > high availability, I am thinking of setting up solr cloud with the following > setup: > > Unix box 1 > zookeeper 1 (master) > Solr instance 1 (Shard 1 - leader node) > -------- > > Unix_box_2 > zookeeper 2 > Solr instance 2 (Shard 2) > -------- > > Unix_box_3 > zookeeper 3 > Solr instance 3 (Replica for Shard 1) > -------- > > Unix_box_4 > Solr instance 4 (Replica for Shard 2) > -------- > > ======================================================================================== > > Now following are my queries: > > 1) Is it possible for me to split the present solr running on one node with > 24 million docs under Collection1 into 2 shards as shown above ? > 2) If yes how can I achieve this, and approximately how long does it take ? > 3) For my application to fetch the result from solr, I need to give one solr > url meaning http://Unix_box_1:8983/solr . In this case if I have some docs > on shard2 (which is on Unix_box_2) and some on shard1 (Unix_box_1), will my > search result in the application fetch docs from both the shards and combine > the result ? > > ========================================================================================= > > > Thank you for your patience and time. > > Regards, > Ashwin >