And just to make it worse, I've seen lots of cases where the correct answer
is "neither, performance is constrained by memory" <G>...

Erick


On Sun, Mar 17, 2013 at 10:44 PM, David Parks <davidpark...@yahoo.com>wrote:

> Thank you, Manu, for that excellent discussion on the topic, I could have
> been more detailed about my use case.
>
> We'll be indexing off-of the main production servers (either on a master,
> or
> in Hadoop, we're yet to build out that piece of the puzzle). We don't store
> documents at all, we only store the index data and return a document ID,
> each document is maybe 1k of text, small.  We do have a few "interesting"
> queries in which we do some grouping.
>
> We currently index 100GB of input data, that'll grow 2x or 3x in the near
> future.
>
> So based on your experience, it seems likely that we'll be CPU bound (heavy
> queries against a static index updated nightly from the master), thus
> nullifying the advantage of dual-purposing a box with another CPU bound
> app.
>
> Very useful discussion, I'll get proper load tests done in time but this
> helps direct my thinking now.
>
> David
>
>
>
> -----Original Message-----
> From: idokis...@gmail.com [mailto:idokis...@gmail.com] On Behalf Of Manuel
> Le Normand
> Sent: Monday, March 18, 2013 9:57 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is Solr more CPU bound or IO bound?
>
> Your question is a typical use-case dependent, the bottleneck will change
> from user to user.
>
> These are two main issues that will affect the answer:
> 1. How do you index: what is your indexing rate (how many docs a days)? how
> big is a typical document? how many documents do you plan on indexing in
> tota? do you store fields? calculate their term vectors?
> 2. How looks you retrieval process: What's the query rate expected? Are
> there common queries (taking advantage of the cache)? Complexity of queries
> (faceted / highlighted / filtered/ how many conditions, NRT)? Do you plan
> to
> retrieve stored fields or only id's?
>
> After answering all that there's an interative game between hardware
> configuration and software configuration (how do you split your shards, use
> your cache, tuning your merges and flushes etc) that would also affect the
> IO / CPU bounded answer.
>
> In my use-case for example the indexing part is IO bounded, but as my
> indexing rate is much below the rate my machine could initially provide it
> didn't affect my hardware spec.
> After fine tuning my configuration i discovered my retrieval process was
> CPU
> bounded and was directly affecting my avg response time, while the IO rate
> in cache usage was quite low.
>
> Try describing your use case in more details with the above questions so
> we'd be able to give you guidelines.
>
> Best,
> Manu
>
>
> On Mon, Mar 18, 2013 at 3:55 AM, David Parks <davidpark...@yahoo.com>
> wrote:
>
> > I'm spec'ing out some hardware for a first go at our production Solr
> > instance, but I haven't spent enough time loadtesting it yet.
> >
> >
> >
> > What I want to ask if how IO intensive solr is vs. CPU intensive,
> > typically.
> >
> >
> >
> > Specifically I'm considering whether to dual-purpose the Solr servers
> > to run Solr and another CPU-only application we have. I know Solr uses
> > a fair amount of CPU, but if it also is very disk intensive it might
> > be a net benefit to have more instances running Solr and share the CPU
> > resources with the other app than to run Solr separate from the other
> > CPU app that wouldn't otherwise use the disk.
> >
> >
> >
> > Thoughts on this?
> >
> >
> >
> > Thanks,
> >
> > David
> >
> >
> >
> >
>
>

Reply via email to