I guess at this point in the discussion, I should probably give some more background on why I am doing what I am doing. Having a single Solr shard (multiple segments) on the same disk is posing severe performance problems under load,in that, calls to Solr cause a lot of connection timeouts. When we looked at the ganglia stats for the Solr box, we saw that while memory, cpu and network usage were quite normal, the i/o wait spiked. We are unsure on what caused the i/o wait and why there were no spikes in the cpu/memory usage. Since the Solr box is a beefy box (multi-core setup, huge ram, SSD), we'd like to distribute the segments to multiple locations (disks) and see whether this improves performance under load.
@Greg - Thanks for clarifying that. I just learnt that I can't set them up using RAID as some of them are SSDs and some others are SATA (spinning disks). @Shawn Heisey - Could you elaborate more about the "broker" core and delegating the requests to other cores? -Deepak On Wed, Sep 11, 2013 at 1:10 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 9/11/2013 1:07 PM, Deepak Konidena wrote: > >> Are you suggesting a multi-core setup, where all the cores share the same >> schema, and the cores lie on different disks? >> >> Basically, I'd like to know if I can distribute shards/segments on a >> single >> machine (with multiple disks) without the use of zookeeper. >> > > Sure, you can do it all manually. At that point you would not be using > SolrCloud at all, because the way to enable SolrCloud is to tell Solr where > zookeeper lives. > > Without SolrCloud, there is no cluster automation at all. There is no > "collection" paradigm, you just have cores. You have to send updates to > the correct core; they not be redirected for you. Similarly, queries will > not be load balanced automatically. For Java clients, the CloudSolrServer > object can work seamlessly when servers go down. If you're not using > SolrCloud, you can't use CloudSolrServer. > > You would be in charge of creating the shards parameter yourself. The way > that I do this on my index is that I have a "broker" core that has no index > of its own, but its solrconfig.xml has the shards and shards.qt parameters > in all the request handler definitions. You can also include the parameter > with the query. > > You would also have to handle redundancy yourself, either with replication > or with independently updated indexes. I use the latter method, because it > offers a lot more flexibility than replication. > > As mentioned in another reply, setting up RAID with a lot of disks may be > better than trying to split your index up on different filesystems that > each reside on different disks. I would recommend RAID10 for Solr, and it > works best if it's hardware RAID and the controller has battery-backed (or > NVRAM) cache. > > Thanks, > Shawn > >