I would guess so also to a point. After you run out of RAM, indexing also takes a hit. I have noticed on a 2Gb machine when the index gets over 2Gb, my indexing rate when down from 100/s to 40/s. After reaching 4Gb it was down to 10/s. I am trying now with a 8Gb machine to see how far I get through my data before slowing down.
On Wed, Jul 9, 2008 at 7:56 PM, Jacob Singh <[EMAIL PROTECTED]> wrote: > My total guess is that indexing is CPU bound, and searching is RAM bound. > > Best, > Jacob > Ian Connor wrote: >> There was a thread a while ago, that suggested just need to factor in >> the index's total size (Mike Klaas I think was the author). It was >> suggested having the RAM is enough and the OS will cache the files as >> needed to give you the performance boost needed. >> >> If I misread the thread, please chime in - but it seems having enough >> RAM is the key to performance. >> >> On Wed, Jul 9, 2008 at 3:00 AM, Preetam Rao <[EMAIL PROTECTED]> wrote: >>> Hi, >>> >>> Since we plan to share the same box among multiple solr instances on a 16gb >>> RAM multi core box, Need to estimate how much memory we need for our >>> application. >>> >>> The index size is on disk 2.4G with close to 3 million documents. The plan >>> is to use dismax query with some fqs. >>> Since we do not sort the results, the sort will be by score which eliminates >>> the option "userFiterFprSortedQuerries". >>> Thus assuming all q's will use query result cache and all fqs will use >>> filter caches the below is what i am thinking. >>> >>> I would like to know how to relate the index size on disk to its memory size >>> ? >>> Would it be safe to assume gven the disk size of 2.4g, that we can have ram >>> size for whole index plus 1g for any other overhead plus the cache size >>> which comes to 150MB (calculation below). Thus making it around 4g. >>> >>> cache size calculation - >>> -------------------------------- >>> query result cache - size = 50K; >>> since we paginate the results and each page has 10 items and assuming each >>> user will at the max see 3 pages, per query >>> we will set queryResultWindowSize to 30. Assuming this, for 50k querries we >>> will use up 50000* 30 bits = 187K asuming results are stored in bitset. >>> >>> we use few common fqs, lets say 200. Assuming each returns around 30k >>> documents, it adds to 200 * 30000 bits = 750K. >>> >>> If we use document cache of size 20K, assuming each document size is around >>> 5k at the max, it will take up 20000 * 5= 100MB. >>> >>> Thus we can increase the cache more drastically and still it will use up >>> only 150MB or less. >>> >>> Is this reasoning on cache's correct ? >>> >>> Thanks >>> Preetam >>> >> >> >> > >