Hi, Solr suggester is wonderful. We have been testing the built-in dictionary implementations for some large-ish datasets (36m, 132m), and getting single/teen milli-seconds response times with 9 multiple dictionaries per request. Most of the resulting dictionaries have millions entries too. Intrigued with the "finite-state machines" in the prefix/fuzzy suggesters too. Can’t wait to load test this properly.
Now I have some questions: 1. Term frequency, weight/count The suggesters derive suggestions from a field in the index. What’s the feasibility of creating a custom dictionary that can automatically populate the weight/count field using term frequency (tf) during build time? Autosuggest in most cases, is ranked by popularity. “Apache Solr — 3” (say the term occurred 3 times in a field). Why must we specific a weight field explicitly for popularity ranking while tf data is readily available from the index? In our tests, we had to index the datasets twice. In the second pass, tf is looked up per doc via Solr’s term component, and coded in a weight field (for the suggester). The lookup is also necessary for each dictionary field. We used 4 fields for 9 dictionaries. This really provides extra “incentive” to do things in parallel! 2. "mm” lookup The lookup logic for multi-terms is currently “AND boolean”, i.e. the suggestion must matched all terms in suggest.q. However, in our use case, we need “OR boolean” for one dictionary. This is a bit like "suggest.q infix”, e.g.: suggest.q: Apache Sol suggestions: Apache Solr Solr SolrCloud We love Apache I couldn’t get any of the existing lookup impl to find the last three suggestions. Perhaps it’s time to dig into the codes to see if a “minumum should match” mm (50% in the above case) feature is a possibility? Thanks, Boon ----- Boon Low Search Engineer / Lead Big Data DCT Family History http://uk.linkedin.com/in/boonlow/ ________________________________ This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of D.C. Thomson Family History shall be understood as neither given nor endorsed by it. ________________________________ __________________________________________________________________________ This email has been checked for virus and other malicious content prior to leaving our network. __________________________________________________________________________