The indexing box can be much smaller, especially in terms of CPU. It just needs one fast thread and enough disk.
wunder On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote: > I was afraid of that. Was hoping not to need another big fat box like > this one... > > --- > Alok K. Dhir > Symplicity Corporation > www.symplicity.com > (703) 351-0200 x 8080 > [EMAIL PROTECTED] > > On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote: > >> I believe this is one of the reasons that a master/slave configuration >> comes in handy. Commits to the Master don't slow down queries on the >> Slave. >> >> -Todd >> >> -----Original Message----- >> From: Alok Dhir [mailto:[EMAIL PROTECTED] >> Sent: Monday, November 03, 2008 1:47 PM >> To: solr-user@lucene.apache.org >> Subject: SOLR Performance >> >> We've moved past this issue by reducing date precision -- thanks to >> all for the help. Now we're at another problem. >> >> There is relatively constant updating of the index -- new log entries >> are pumped in from several applications continuously. Obviously, new >> entries do not appear in searches until after a commit occurs. >> >> The problem is, issuing a commit causes searches to come to a >> screeching halt for up to 2 minutes. We're up to around 80M docs. >> Index size is 27G. The number of docs will soon be 800M, which >> doesn't bode well for these "pauses" in search performance. >> >> I'd appreciate any suggestions. >> >> --- >> Alok K. Dhir >> Symplicity Corporation >> www.symplicity.com >> (703) 351-0200 x 8080 >> [EMAIL PROTECTED] >> >> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote: >> >>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine. >>> >>> Fairly simple schema -- no large text fields, standard request >>> handler. 4 small facet fields. >>> >>> The index is an event log -- a primary search/retrieval requirement >>> is date range queries. >>> >>> A simple query without a date range subquery is ridiculously fast - >>> 2ms. The same query with a date range takes up to 30s (30,000ms). >>> >>> Concrete example, this query just look 18s: >>> >>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z >> TO >>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" >>> >>> The exact same query without the date range took 2ms. >>> >>> I saw a thread from Apr 2008 which explains the problem being due to >>> too much precision on the DateField type, and the range expansion >>> leading to far too many elements being checked. Proposed solution >>> appears to be a hack where you index date fields as strings and >>> hacking together date functions to generate proper queries/format >>> results. >>> >>> Does this remain the recommended solution to this issue? >>> >>> Thanks >>> >>> --- >>> Alok K. Dhir >>> Symplicity Corporation >>> www.symplicity.com >>> (703) 351-0200 x 8080 >>> [EMAIL PROTECTED] >>> >> >> >