Re: SOLR Performance

Mike Klaas Mon, 03 Nov 2008 16:25:27 -0800

If you never execute any queries, a gig should be more than enough.

Of course, I've never played around with a .8 billion doc corpus onone machine.


-Mike

On 3-Nov-08, at 2:16 PM, Alok Dhir wrote:

in terms of RAM -- how to size that on the indexer?

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Nov 3, 2008, at 4:07 PM, Walter Underwood wrote:

The indexing box can be much smaller, especially in terms of CPU.
It just needs one fast thread and enough disk.

wunder

On 11/3/08 2:58 PM, "Alok Dhir" <[EMAIL PROTECTED]> wrote:

I was afraid of that. Was hoping not to need another big fat boxlike

this one...

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Nov 3, 2008, at 4:53 PM, Feak, Todd wrote:

I believe this is one of the reasons that a master/slaveconfigurationcomes in handy. Commits to the Master don't slow down queries onthe

Slave.

-Todd

-----Original Message-----
From: Alok Dhir [mailto:[EMAIL PROTECTED]
Sent: Monday, November 03, 2008 1:47 PM
To: solr-user@lucene.apache.org
Subject: SOLR Performance

We've moved past this issue by reducing date precision -- thanks to
all for the help.  Now we're at another problem.

There is relatively constant updating of the index -- new logentriesare pumped in from several applications continuously. Obviously,new

entries do not appear in searches until after a commit occurs.

The problem is, issuing a commit causes searches to come to a
screeching halt for up to 2 minutes.  We're up to around 80M docs.
Index size is 27G.  The number of docs will soon be 800M, which
doesn't bode well for these "pauses" in search performance.

I'd appreciate any suggestions.

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

On Oct 29, 2008, at 4:30 PM, Alok Dhir wrote:

Hi -- using solr 1.3 -- roughly 11M docs on a 64 gig 8 coremachine.
Fairly simple schema -- no large text fields, standard request
handler.  4 small facet fields.
The index is an event log -- a primary search/retrievalrequirement
is date range queries.
A simple query without a date range subquery is ridiculouslyfast -
2ms.  The same query with a date range takes up to 30s (30,000ms).

Concrete example, this query just look 18s:

instance:client\-csm.symplicity.com AND dt:[2008-10-01T04:00:00Z

TO

2008-10-30T03:59:59Z] AND label_facet:"Added to Position"

The exact same query without the date range took 2ms.

I saw a thread from Apr 2008 which explains the problem beingdue to

too much precision on the DateField type, and the range expansion
leading to far too many elements being checked.  Proposed solution
appears to be a hack where you index date fields as strings and
hacking together date functions to generate proper queries/format
results.

Does this remain the recommended solution to this issue?

Thanks

---
Alok K. Dhir
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8080
[EMAIL PROTECTED]

Re: SOLR Performance

Reply via email to