Re: Index updates blocking readers: To Multicore or not?

John Martyniak Wed, 22 Oct 2008 08:14:19 -0700

Jim,

This is a off topic question.

But for your 30M documents, did you fetch those from external websites (Whole Web Search)? Or are they internal documents? If theyare external what method did you use to fetch them and which spider?

I am in the process of deciding between using Nutch for whole webindexing, Solr + Spider?, or Nutch + Solr, etc.


Thank you in advance for your insight into this issue.

-John

On Oct 22, 2008, at 10:55 AM, Jim Murphy wrote:

Thanks Yonik,

I have more information...
1. We do indeed have large indexes: 40GB on disk, 30M documents -and is
just a test server we have 8 of these in parallel.
2. The performance problem I was seeing followed replication, andfirstquery on a new searcher. It turns out we didn't configure indexwarmingqueries very well so we removes the various "solr rocks" typequeries to onethat was better for our data - and had not improvement. The problemwasthat replication completed, a new searcher was created andregistered butthe first query qould take 10-20 seconds to complete. There afterit took
<200 milliseconds for similar non-cached queries.
Profiler pointed us to building the FieldSortedHitQueue was takingall thetime. Our warming query did not include a sort but our queriescommonly do.Once we added the sort parameter our warming query started takingthe 10-20seconds prior to registering the searcher. After that the firstquery on
the new searcher took the expected 200ms.
LESSON LEARNED: warm your caches! And, if a sort is involved in yourqueriesincorporate that sort in your warming query! Add a warming queryfor each
kind of sort that you expect to do.









Yonik Seeley wrote:
On Mon, Oct 6, 2008 at 2:10 PM, Jim Murphy <[EMAIL PROTECTED]>wrote:
We have a farm of several Master-Slave pairs all managing a singlevery
large
"logical" index sharded across the master-slaves.  We notice on the
slaves,
after an rsync update, as the index is being committed that allqueries
are
blocked sometimes resulting in unacceptable service times. I'mlooking
at
ways we can manage these "update burps".
Updates should never block queries.
What version of Solr are you using?
Is it possible that your indexes are so big, opening a new index in
the background causes enough of the old index to be flushed from OS
cache, causing big slowdowns?

-Yonik
Question #1: Anything obvious I can tweak in the configuration to
mitigate
these multi-second blocking updates? Our Indexes are 40GB, 20Mdocumentseach. RSync updates are every 5 minutes several hundred KB perupdate.
Question #2: I'm considering setting up each slave with multipleSolr
cores.
The 2 indexes per instance would be nearly identical copies but"A" would
be
read from while "B" is being updated, then they would swap. I'llhave tofigure out how to rsync these 2 indexes properly but if I can getthecommits to happen to the offline index then I suspect my queriescould
proceed unblocked.

Is this the wrong tree to be barking up?  Any other thoughts?

Thanks in advance,

Jim



--
View this message in context:
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p19843098.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
View this message in context: 
http://www.nabble.com/Index-updates-blocking-readers%3A-To-Multicore-or-not--tp19843098p20112546.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index updates blocking readers: To Multicore or not?

Reply via email to