Hi we have 20million short docs (about 60 terms, less than 1k in total bytes each) on each box, and we wanted to rank results based on how many terms got matched only. In particular we are only interested in top N with best scores (say a small number like 5).
With some help from the forum users (Thanks to Otis), we chose to use edismax with mm set properly (something like 85% or 80% as we wanted to have reasonable recall). It seems like the recall is good but performance is way off. The results vary from 30ms to 2s but we need 200 ~ 300ms for 99% of searches. Since our searching requirement is really straightforward, we don't need tf, idf, positions etc, nor do we need fancy tokenizers since our terms are all pre-processed. In addition, we also don't need to evaluate scores, or sorting over a large doc set as long as we know the top N that has to most terms matched. Any advice on how to custom the process to make it faster? And what could be potential perf bottlenecks (searching in the index, or scoring or sorting)? Could this be done by plugin or we need deeper hacking? Some facts 1) the machine we use are good, so hardware is not a solution 2) dismax seems not working but edismax works (I though dismax could have an edge in perf but I couldn't run it) -- View this message in context: http://lucene.472066.n3.nabble.com/customize-solr-search-scoring-for-performance-tp4019444.html Sent from the Solr - User mailing list archive at Nabble.com.