Re: Distributing lucene segments across multiple disks.

Deepak Konidena Wed, 11 Sep 2013 14:25:01 -0700

I guess at this point in the discussion, I should probably give some more
background on why I am doing what I am doing. Having a single Solr shard
(multiple segments) on the same disk is posing severe performance problems
under load,in that, calls to Solr cause a lot of connection timeouts. When
we looked at the ganglia stats for the Solr box, we saw that while memory,
cpu and network usage were quite normal, the i/o wait spiked. We are unsure
on what caused the i/o wait and why there were no spikes in the cpu/memory
usage. Since the Solr box is a beefy box (multi-core setup, huge ram, SSD),
we'd like to distribute the segments to multiple locations (disks) and see
whether this improves performance under load.


@Greg - Thanks for clarifying that.  I just learnt that I can't set them up
using RAID as some of them are SSDs and some others are SATA (spinning
disks).

@Shawn Heisey - Could you elaborate more about the "broker" core and
delegating the requests to other cores?


-Deepak



On Wed, Sep 11, 2013 at 1:10 PM, Shawn Heisey <s...@elyograg.org> wrote:

> On 9/11/2013 1:07 PM, Deepak Konidena wrote:
>
>> Are you suggesting a multi-core setup, where all the cores share the same
>> schema, and the cores lie on different disks?
>>
>> Basically, I'd like to know if I can distribute shards/segments on a
>> single
>> machine (with multiple disks) without the use of zookeeper.
>>
>
> Sure, you can do it all manually.  At that point you would not be using
> SolrCloud at all, because the way to enable SolrCloud is to tell Solr where
> zookeeper lives.
>
> Without SolrCloud, there is no cluster automation at all.  There is no
> "collection" paradigm, you just have cores.  You have to send updates to
> the correct core; they not be redirected for you.  Similarly, queries will
> not be load balanced automatically.  For Java clients, the CloudSolrServer
> object can work seamlessly when servers go down.  If you're not using
> SolrCloud, you can't use CloudSolrServer.
>
> You would be in charge of creating the shards parameter yourself.  The way
> that I do this on my index is that I have a "broker" core that has no index
> of its own, but its solrconfig.xml has the shards and shards.qt parameters
> in all the request handler definitions.  You can also include the parameter
> with the query.
>
> You would also have to handle redundancy yourself, either with replication
> or with independently updated indexes.  I use the latter method, because it
> offers a lot more flexibility than replication.
>
> As mentioned in another reply, setting up RAID with a lot of disks may be
> better than trying to split your index up on different filesystems that
> each reside on different disks.  I would recommend RAID10 for Solr, and it
> works best if it's hardware RAID and the controller has battery-backed (or
> NVRAM) cache.
>
> Thanks,
> Shawn
>
>

Re: Distributing lucene segments across multiple disks.

Reply via email to