On 8/1/2014 4:19 AM, anand.mahajan wrote:
> My current deployment : 
>  i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
> - 24 Core + 96 GB RAM each.
>  ii)There are over 190M docs in the SolrCloud at the moment (for all
> replicas its consuming overall disk 2340GB which implies - each doc is at
> about 5-8kb in size.)
>  iii) The docs are split into 36 Shards - and 3 replica per shard (in all
> 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
> running on each host)

<snip>

> 2. Should I have been better served had I deployed a Single Jetty Solr
> instance per server with multiple cores running inside? The servers do start
> to swap out after a couple of days of Solr uptime - right now we reboot the
> entire cluster every 4 days.

Others have already mentioned the problems with autoCommit being far too
frequent, so I'll just echo their advice to increase the intervals.

You should DEFINITELY have exactly one jetty process per server.  One
Solr process can handle *many* shard replicas (cores).  With 18 per
server, that's a LOT of overhead (especially memory) that is not required.

> 3. The routing key is not able to effectively balance the docs on available
> shards - There are a few shards with just about 2M docs - and others over
> 11M docs. Shall I split the larger shards? But I do not have more nodes /
> hardware to allocate to this deployment. In such case would splitting up the
> large shards give better read-write throughput? 
>
> 4. To remain with the current hardware - would it help if I remove 1 replica
> each from a shard? But that would mean even when just 1 node goes down for a
> shard there would be only 1 live node left that would not serve the write
> requests.

Why not just let Solr automatically handle routing with the compositeId
router?  Chances are excellent that this will result in perfect shard
balancing.  Unless you want completely manual sharding (not controlled
at all by SolrCloud), don't complicate it by trying to influence the
routing.

I think that when you say "1 live node left that would not serve the
write requests" above, you may have a misconception about SolrCloud. 
*ALL* replicas have the same indexing load.  Although the replication
handler is required when you use SolrCloud, replication is *NOT* how the
data gets on all replicas.  Each update request makes its way to the
shard leader, then the shard leader sends that update request to all
replicas, and each one independently indexes the content.  Replication
only gets used when something goes wrong and a shard needs recovery.

> 5. Also, is there a way to control where the Split Shard replicas would go?
> Is there a pattern / rule that Solr follows when it creates replicas for
> split shards?
>
> 6. I read somewhere that creating a Core would cost the OS one thread and a
> file handle. Since a core repsents an index in its entirty would it not be
> allocated the configured number of write threads? (The dafault that is 8)
>
> 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
> - Would separating the ZK cluster out help?

I don't know for sure what rules are followed when creating the replicas
(cores).  SolrCloud will make sure that replicas end up on different
nodes, with a node being a Solr JVM, *NOT* a machine.  If one node has
fewer replicas already onboard than another, it will likely be preferred
...but I actually don't know if that logic is incorporated.

Each core probably does create a thread, but there will be far more than
one filehandle.  A Lucene index (there is one in every core) is normally
composed of dozens or hundreds of files, each of which will require a
file handle.  Each network connection also uses file handles.

If load is light, putting ZK on the same nodes/disks as Solr is not a
big deal.  If load is heavy, you will want the ZK database to have its
own dedicated disk spindle(s) ... but ZK's CPU requirements are usually
very small.  Completely separate servers are not usually required, but
if you can do that, you would be much better protected against
performance problems.

Thanks,
Shawn

Reply via email to