Hi Walter, you are right, it were mostly robots (Googlebot, Yahoo/Slurp, etc);
I have friendly URLs like http://www.tokenizer.org/USA/?page=7 (30mlns docs, 3mlns pages) http://www.tokenizer.org/www.newegg.com/ http://www.tokenizer.org/www.newegg.com/?sort=link&dir=asc&q=Opteron And even this: http://www.tokenizer.org/AMD/Opteron/8350/ I disabled processing for URLs with no query parameter (empty results); but I should really limit pagination programmatically... fortunately http://www.tokenizer.org/?q=USA returns 50k documents (search doesn't use "Country" field). But some queries may return huge nuber of documents (better is to tune "stop-word" list) -Fuad > -----Original Message----- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: December-24-09 1:51 PM > To: solr-user@lucene.apache.org > Subject: Re: SOLR Performance Tuning: Pagination > > Some bots will do that, too. Maybe badly written ones, but we saw that at > Netflix. It was causing search timeouts just before a peak traffic period, > so we set a page limit in the front end, something like 200 pages. > > It makes sense for that to be very slow, because a request for hit > 28838540 means that Solr has to calculate the relevance for 28838540 + 10 > documents. > > Fuad: Why are you benchmarking this? What user is looking at 20M > documents? > > wunder > > On Dec 24, 2009, at 10:44 AM, Erik Hatcher wrote: > > > > > On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote: > >> When do users do a query like that? --wunder > > > > Well, SolrEntityProcessor "users" do :) > > > > http://issues.apache.org/jira/browse/SOLR-1499 > > (which by the way I plan on polishing and committing over the holidays) > > > > Erik > > > > > > > >> > >> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote: > >> > >>> I used pagination for a while till found this... > >>> > >>> > >>> I have filtered query ID:[* TO *] returning 20 millions results (no > >>> faceting), and pagination always seemed to be fast. However, fast only > with > >>> low values for start=12345. Queries like start=28838540 take 40-60 > seconds, > >>> and even cause OutOfMemoryException. > >>> > >>> I use highlight, faceting on nontokenized "Country" field, standard > handler. > >>> > >>> > >>> It even seems to be a bug... > >>> > >>> > >>> Fuad Efendi > >>> +1 416-993-2060 > >>> http://www.linkedin.com/in/liferay > >>> > >>> Tokenizer Inc. > >>> http://www.tokenizer.ca/ > >>> Data Mining, Vertical Search > >>> > >> > >