Hello, I've been exploring usage of the spellcheck feature via solr 1.3. I have it working, but there are some issues I'm seeing that make it less useful than it could be. Response on the solr-user mailing list has been limited. I'm guessing the reason may be that I'm asking about issues which are most relevant to the lucene codebase. So, I hope you don't mind this cross-posting.
I've noticed a few issues with spellcheck as I've been testing it out for use on our site... 1. Rebuild breaks requests - I'm using rebuildOnCommit ATM. If a commit is going on and files are being rebuilt in the spellcheck data dir, spellcheck requests yield bogus answers. I.e. I can issue identical requests and get drastically different answers. The first time, I get suggestions and "correctlySpelled" is false. The second time (during the commit), I get no suggestions and "correctlySpelled" is true. Shouldn't spellcheck use the old index until the new one is ready for use, like solr does with optimizes? 2. Inconsistent ordering - The first suggestion changes depending on the spellcheck.count that I specify. If my query is "chanl" and I ask for one result, the suggestion is "chant" (freq. 16). If I ask for 5 results, the first suggestion is also "chant"; the other 4 suggestions are less frequent (#2 is "chang", freq. 11). However, if I ask for 10 results, the first suggestion is "chanel" (freq. 1296); #2 and #3 are "chant" and "chang"; #9 is "chan" (freq. 174). Shouldn't spellcheck always return the best suggestion first? In my case, shouldn't "chanel" always top "chant" and "chang" since they all have the same edit distance yet "chanel" is two orders of mangnitude more popular? Is there anything I could be doing wrong to create these problems? If not, are these known issues? If not, should I create jira's for them? Thanks, Jason