customize solr search/scoring for performance

jchen2000 Fri, 09 Nov 2012 17:01:17 -0800

Hi 

we have 20million short docs (about 60 terms, less than 1k in total bytes
each) on each box, and we wanted to rank results based on how many terms got
matched only. In particular we are only interested in top N with best scores
(say a small number like 5).


With some help from the forum users (Thanks to Otis), we chose to use
edismax with mm set properly (something like 85% or 80% as we wanted to have
reasonable recall). It seems like the recall is good but performance is way
off. The results vary from 30ms to 2s but we need 200 ~ 300ms for 99% of
searches.   Since our searching requirement is really straightforward, we
don't need tf, idf, positions etc, nor do we need fancy tokenizers since our
terms are all pre-processed. In addition, we also don't need to evaluate
scores, or sorting over a large doc set as long as we know the top N that
has to most terms matched. 

Any advice on how to custom the process to make it faster? And what could be
potential perf bottlenecks (searching in the index, or scoring or sorting)? 
Could this be done by plugin or we need deeper hacking? 

Some facts
1) the machine we use are good, so hardware is not a solution
2) dismax seems not working but edismax works (I though dismax could have an
edge in perf but I couldn't run it)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444.html
Sent from the Solr - User mailing list archive at Nabble.com.

customize solr search/scoring for performance

Reply via email to