If you never execute any queries, a gig should be more than enough.

Of course, I've never played around with a .8 billion doc corpus on one machine.

-Mike

On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:

in terms of RAM -- how to size that on the indexer?

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:

The indexing box can be much smaller, especially in terms of CPU.
It just needs one fast thread and enough disk.

wunder

On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote:

I was afraid of that. Was hoping not to need another big fat box like
this one...

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:

I believe this is one of the reasons that a master/slave configuration comes in handy. Commits to the Master don't slow down queries on the
Slave.

-Todd

-----Original Message-----
From: Alok Dhir [mailto:[EMAIL PROTECTED]
Sent: Monday, November 03, 2008 1:47 PM
To: solr-user@lucene.apache.org
Subject: SOLR Performance

We've moved past this issue by reducing date precision -- thanks to
all for the help.  Now we're at another problem.

There is relatively constant updating of the index -- new log entries are pumped in from several applications continuously. Obviously, new
entries do not appear in searches until after a commit occurs.

The problem is, issuing a commit causes searches to come to a
screeching halt for up to 2 minutes.  We're up to around 80M docs.
Index size is 27G.  The number of docs will soon be 800M, which
doesn't bode well for these "pauses" in search performance.

I'd appreciate any suggestions.

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:

Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 core machine.

Fairly simple schema -- no large text fields, standard request
handler.  4 small facet fields.

The index is an event log -- a primary search/retrieval requirement
is date range queries.

A simple query without a date range subquery is ridiculously fast -
2ms.  The same query with a date range takes up to 30s (30,000ms).

Concrete example, this query just look 18s:

instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z
TO
2008-10-30T03:59:59Z] AND label_facet:"Added to Position"

The exact same query without the date range took 2ms.

I saw a thread from Apr 2008 which explains the problem being due to
too much precision on the DateField type, and the range expansion
leading to far too many elements being checked.  Proposed solution
appears to be a hack where you index date fields as strings and
hacking together date functions to generate proper queries/format
results.

Does this remain the recommended solution to this issue?

Thanks

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]







Reply via email to