OK..

The intent is to collapse on the field domain..

Here's a query that works fine and the way I want with the Collapsing query
parser..

/select?defType=dismax&fl=score,content,description,keywords,title&fq={!collapse%20field=domain%20nullPolicy=expand}&pf=content^0.05%20description^0.03%20keywords^0.03%20title^0.05%20url^0.06&q=bernie+sanders&qf=title%20description%20keywords%20content%20url

This is a complex query with 20 terms mixed alpha & numeric single
characters..

/select?defType=dismax&fl=score,content,description,keywords,title&fq={!collapse%20field=domain%20nullPolicy=expand}&pf=content^0.05%20description^0.03%20keywords^0.03%20title^0.05%20url^0.06&q=1+2+e+3+s+a+d+f+r+4+5+t+g+6+7+8+7+1+2+3+6&qf=title%20description%20keywords%20content%20url

This query crashes solr with the OOM process killer..

Removing the collapsing query parser {!collapse field=domain
nullPolicy=expand} eliminates the problem and never crashes solr on any
query by my testing.. A search of 20 alpha & numeric characters with spaces
is very slow though..

With the collapsing query parser the single numeric terms cause solr to
crash.. using whole words works but slow if there's too many terms..

The debug on all successful queries shows no errors.. the default is 10
rows.. a cold search (not cached) on a 2 word phrase takes 2-4 seconds.
Adding more than 3-4 numbers with spaces to the search kills it..

There is no debug for the failed queries as solr is killed by the process
killer..

Extreme queries are long multi term queries or long queries of single number
& letters with spaces in between.  Something like '1 3 s 2 c 4 5 t s 5 6 3 a
s 4 e 6 1 4 3 2 4 5 6 ' will cause it to search for all those individual
terms which are likely to be very frequent.. This type of query seems to
make solr work really hard..

While it's not likely that users would make such searches I need to prevent
solr from crashing with the collapsing query parser.. This type of query can
cause a heavy load on various types of search systems and can be used in DOS
attacks targeting search systems.. You can try a 20 term query made of
numbers & letters with spaces between to see what I mean if you have a 100m
doc index handy..

I can try to prevent these types of queries through the search API by
rewriting the user input.. However if there is a way to make solr time out
instead of being killed that would be preferable.. Otherwise I'll have to
find a different way to limit the number of results per domain..

I have some more ram to put in the server tomorrow, that might help..  I
don't mind if the complex searches are slow.. but crashing out is not good..
especially with the process killer killing solr completely..

Currently this is on a master/slave setup, 150m docs 800GB, 24GB ram, 16GB
heap..



-----
Bee Keeper at IZaBEE.com
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to