Funny, that is exactly what Infoseek did back in 1996. A big index that changed rarely and a small index with real-time changes. Once each week, merge to make a new big index and start over with the small one.
You also need to handle deletes specially. wunder On 11/3/08 6:44 PM, "Lance Norskog" <[EMAIL PROTECTED]> wrote: > The logistics of handling giant index files hit us before search > performance. We switched to a set of indexes running inside one server > (tomcat) instance with the Multicore+Distributed Search tools, with a frozen > old index and a new index actively taking updates. The smaller new index > takes much less time to recover after a commit. > > The DS code does not handle cases where the new and old index have different > versions of the same document. We wrote a custom distributed search that > favored the "new" index over the "old". > > Lance > > -----Original Message----- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Monday, November 03, 2008 4:25 PM > To: solr-user@lucene.apache.org > Subject: Re: SOLR Performance > > If you never execute any queries, a gig should be more than enough. > > Of course, I've never played around with a .8 billion doc corpus on one > machine. > > -Mike > > On 3-Nov-08, at 2:16 PM, Alok Dhir wrote: > >> in terms of RAM -- how to size that on the indexer? >> >> --- >> Alok K. Dhir >> Symplicity Corporation >> www.symplicity.com >> (703) 351-0200 x 8080 >> [EMAIL PROTECTED] >> >> On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote: >> >>> The indexing box can be much smaller, especially in terms of CPU. >>> It just needs one fast thread and enough disk. >>> >>> wunder >>> >>> On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote: >>> >>>> I was afraid of that. Was hoping not to need another big fat box >>>> like this one... >>>> >>>> --- >>>> Alok K. Dhir >>>> Symplicity Corporation >>>> www.symplicity.com >>>> (703) 351-0200 x 8080 >>>> [EMAIL PROTECTED] >>>> >>>> On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote: >>>> >>>>> I believe this is one of the reasons that a master/slave >>>>> configuration comes in handy. Commits to the Master don't slow down >>>>> queries on the Slave. >>>>> >>>>> -Todd >>>>> >>>>> -----Original Message----- >>>>> From: Alok Dhir [mailto:[EMAIL PROTECTED] >>>>> Sent: Monday, November 03, 2008 1:47 PM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: SOLR Performance >>>>> >>>>> We've moved past this issue by reducing date precision -- thanks to >>>>> all for the help. Now we're at another problem. >>>>> >>>>> There is relatively constant updating of the index -- new log >>>>> entries are pumped in from several applications continuously. >>>>> Obviously, new entries do not appear in searches until after a >>>>> commit occurs. >>>>> >>>>> The problem is, issuing a commit causes searches to come to a >>>>> screeching halt for up to 2 minutes. We're up to around 80M docs. >>>>> Index size is 27G. The number of docs will soon be 800M, which >>>>> doesn't bode well for these "pauses" in search performance. >>>>> >>>>> I'd appreciate any suggestions. >>>>> >>>>> --- >>>>> Alok K. Dhir >>>>> Symplicity Corporation >>>>> www.symplicity.com >>>>> (703) 351-0200 x 8080 >>>>> [EMAIL PROTECTED] >>>>> >>>>> On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote: >>>>> >>>>>> Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core >>>>>> machine. >>>>>> >>>>>> Fairly simple schema -- no large text fields, standard request >>>>>> handler. 4 small facet fields. >>>>>> >>>>>> The index is an event log -- a primary search/retrieval >>>>>> requirement is date range queries. >>>>>> >>>>>> A simple query without a date range subquery is ridiculously fast >>>>>> - 2ms. The same query with a date range takes up to 30s >>>>>> (30,000ms). >>>>>> >>>>>> Concrete example, this query just look 18s: >>>>>> >>>>>> instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z >>>>> TO >>>>>> 2008-10-30T03:59:59Z] AND label_facet:"Added to Position" >>>>>> >>>>>> The exact same query without the date range took 2ms. >>>>>> >>>>>> I saw a thread from Apr 2008 which explains the problem being due >>>>>> to too much precision on the DateField type, and the range >>>>>> expansion leading to far too many elements being checked. >>>>>> Proposed solution appears to be a hack where you index date fields >>>>>> as strings and hacking together date functions to generate proper >>>>>> queries/format results. >>>>>> >>>>>> Does this remain the recommended solution to this issue? >>>>>> >>>>>> Thanks >>>>>> >>>>>> --- >>>>>> Alok K. Dhir >>>>>> Symplicity Corporation >>>>>> www.symplicity.com >>>>>> (703) 351-0200 x 8080 >>>>>> [EMAIL PROTECTED] >>>>>> >>>>> >>>>> >>>> >>> >> > >