I was just reading about configuring mass computation grids: hardware writes on 2 striped disks take 10% than writes on a single disk, because you have to wait for the slower disk to finish. So, single disks without RAID are faster.
I don't know how much SSD disks cost, but they will certainly cure the disk i/o problem. On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹) <scott....@udngroup.com> wrote: > In our current lab project, we already built a Chinese newspaper index with > 18 millions documents. The index size is around 51GB. So I am very concerned > about the memory issue you guys mentioned. > > I also look up the Hathitrust report on SolrPerformanceData page: > http://wiki.apache.org/solr/SolrPerformanceData. They said their main > bottleneck is Disk-I/O even they have 10 shards spread over 4 servers. > > Can you guys give me some helpful suggestion about hardward spec & memory > configuration on our project? > > Thanks in advance. > > Scott > > ----- Original Message ----- From: "Lance Norskog" <goks...@gmail.com> > To: <solr-user@lucene.apache.org> > Sent: Tuesday, August 31, 2010 1:01 PM > Subject: Re: Hardware Specs Question > > > There are synchronization points, which become chokepoints at some > number of cores. I don't know where they cause Lucene to top out. > Lucene apps are generally disk-bound, not CPU-bound, but yours will > be. There are so many variables that it's really not possible to give > any numbers. > > Lance > > On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian <anith...@gmail.com> wrote: >> >> Lance, >> >> makes sense and I have heard about the long GC times on large heaps but I >> personally haven't experienced a slowdown but that doesn't mean anything >> either :-). Agreed that tuning the SOLR caching is the way to go. >> >> I haven't followed all the solr/lucene changes but from what I remember >> there are synchronization points that could be a bottleneck where adding >> more cores won't help this problem? Or am I completely missing something. >> >> Thanks again >> Amit >> >> On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹) >> <scott....@udngroup.com>wrote: >> >>> I am also curious as Amit does. Can you make an example about the garbage >>> collection problem you mentioned? >>> >>> ----- Original Message ----- From: "Lance Norskog" <goks...@gmail.com> >>> To: <solr-user@lucene.apache.org> >>> Sent: Tuesday, August 31, 2010 9:14 AM >>> Subject: Re: Hardware Specs Question >>> >>> >>> >>> It generally works best to tune the Solr caches and allocate enough >>>> >>>> RAM to run comfortably. Linux & Windows et. al. have their own cache >>>> of disk blocks. They use very good algorithms for managing this cache. >>>> Also, they do not make long garbage collection passes. >>>> >>>> On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian <anith...@gmail.com> >>>> wrote: >>>> >>>>> Lance, >>>>> >>>>> Thanks for your help. What do you mean by that the OS can keep the >>>>> index >>>>> in >>>>> memory better than Solr? Do you mean that you should use another means >>>>> to >>>>> keep the index in memory (i.e. ramdisk)? Is there a generally accepted >>>>> heap >>>>> size/index size that you follow? >>>>> >>>>> Thanks >>>>> Amit >>>>> >>>>> On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog <goks...@gmail.com> >>>>> wrote: >>>>> >>>>> The price-performance knee for small servers is 32G ram, 2-6 SATA >>>>>> >>>>>> disks on a raid, 8/16 cores. You can buy these servers and half-fill >>>>>> them, leaving room for expansion. >>>>>> >>>>>> I have not done benchmarks about the max # of processors that can be >>>>>> kept busy during indexing or querying, and the total numbers: QPS, >>>>>> response time averages & variability, etc. >>>>>> >>>>>> If your index file size is 8G, and your Java heap is 8G, you will do >>>>>> long garbage collection cycles. The operating system is very good at >>>>>> keeping your index in memory- better than Solr can. >>>>>> >>>>>> Lance >>>>>> >>>>>> On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian <anith...@gmail.com> >>>>>> wrote: >>>>>> > Hi all, >>>>>> > >>>>>> > I am curious to know get some opinions on at what point having more >>>>>> > > > >>>>>> CPU >>>>>> > cores shows diminishing returns in terms of QPS. Our index size is > >>>>>> about >>>>>> 8GB >>>>>> > and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216. >>>>>> > Currently I have the heap to 8GB. >>>>>> > >>>>>> > We are looking to get more servers to increase capacity and because >>>>>> > > > >>>>>> the >>>>>> > warranty is set to expire on our old servers and so I was curious > >>>>>> before >>>>>> > asking for a certain spec what others run and at what point does > >>>>>> having >>>>>> more >>>>>> > cores cease to matter? Mainly looking at somewhere between 4-12 > >>>>>> > cores >>>>>> > per >>>>>> > server. >>>>>> > >>>>>> > Thanks! >>>>>> > Amit >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Lance Norskog >>>>>> goks...@gmail.com >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Lance Norskog >>>> goks...@gmail.com >>>> >>>> >>> >>> >>> >>> -------------------------------------------------------------------------------- >>> >>> >>> >>> ___b___J_T_________f_r_C >>> Checked by AVG - www.avg.com >>> Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10 >>> 14:35:00 >>> >>> >> > > > > -- > Lance Norskog > goks...@gmail.com > > > > -------------------------------------------------------------------------------- > > > > ___b___J_T_________f_r_C > Checked by AVG - www.avg.com > Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10 > 02:34:00 > > -- Lance Norskog goks...@gmail.com