And just to make it worse, I've seen lots of cases where the correct answer is "neither, performance is constrained by memory" <G>...
Erick On Sun, Mar 17, 2013 at 10:44 PM, David Parks <davidpark...@yahoo.com>wrote: > Thank you, Manu, for that excellent discussion on the topic, I could have > been more detailed about my use case. > > We'll be indexing off-of the main production servers (either on a master, > or > in Hadoop, we're yet to build out that piece of the puzzle). We don't store > documents at all, we only store the index data and return a document ID, > each document is maybe 1k of text, small. We do have a few "interesting" > queries in which we do some grouping. > > We currently index 100GB of input data, that'll grow 2x or 3x in the near > future. > > So based on your experience, it seems likely that we'll be CPU bound (heavy > queries against a static index updated nightly from the master), thus > nullifying the advantage of dual-purposing a box with another CPU bound > app. > > Very useful discussion, I'll get proper load tests done in time but this > helps direct my thinking now. > > David > > > > -----Original Message----- > From: idokis...@gmail.com [mailto:idokis...@gmail.com] On Behalf Of Manuel > Le Normand > Sent: Monday, March 18, 2013 9:57 AM > To: solr-user@lucene.apache.org > Subject: Re: Is Solr more CPU bound or IO bound? > > Your question is a typical use-case dependent, the bottleneck will change > from user to user. > > These are two main issues that will affect the answer: > 1. How do you index: what is your indexing rate (how many docs a days)? how > big is a typical document? how many documents do you plan on indexing in > tota? do you store fields? calculate their term vectors? > 2. How looks you retrieval process: What's the query rate expected? Are > there common queries (taking advantage of the cache)? Complexity of queries > (faceted / highlighted / filtered/ how many conditions, NRT)? Do you plan > to > retrieve stored fields or only id's? > > After answering all that there's an interative game between hardware > configuration and software configuration (how do you split your shards, use > your cache, tuning your merges and flushes etc) that would also affect the > IO / CPU bounded answer. > > In my use-case for example the indexing part is IO bounded, but as my > indexing rate is much below the rate my machine could initially provide it > didn't affect my hardware spec. > After fine tuning my configuration i discovered my retrieval process was > CPU > bounded and was directly affecting my avg response time, while the IO rate > in cache usage was quite low. > > Try describing your use case in more details with the above questions so > we'd be able to give you guidelines. > > Best, > Manu > > > On Mon, Mar 18, 2013 at 3:55 AM, David Parks <davidpark...@yahoo.com> > wrote: > > > I'm spec'ing out some hardware for a first go at our production Solr > > instance, but I haven't spent enough time loadtesting it yet. > > > > > > > > What I want to ask if how IO intensive solr is vs. CPU intensive, > > typically. > > > > > > > > Specifically I'm considering whether to dual-purpose the Solr servers > > to run Solr and another CPU-only application we have. I know Solr uses > > a fair amount of CPU, but if it also is very disk intensive it might > > be a net benefit to have more instances running Solr and share the CPU > > resources with the other app than to run Solr separate from the other > > CPU app that wouldn't otherwise use the disk. > > > > > > > > Thoughts on this? > > > > > > > > Thanks, > > > > David > > > > > > > > > >