Got it. Thanks again for all the info! Will open a JIRA and follow up about this sometime soon.
Thanks, Nalini On Fri, Dec 14, 2012 at 1:32 PM, Dyer, James <james.d...@ingramcontent.com>wrote: > Nalini, > > I don't think you can change the *default* response format until a new > major release (so its ok for Trunk/5.0 but not for the 4.x branch). What > you can do, however, is create a new "spellcheck.xxx" parameter to let > users opt-in to the new functionality in 4.x as desired. We'd also want to > update solrj so java clients could easily use the new feature (see > http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/response/SpellCheckResponse.java > ). > > I'm not sure I ever heard someone wanting to combine suggestions from > multiple cores before. I'd be interested in hearing more about what you're > trying to do. But this does seem similar to the problem of combining > suggestions between multiple SpellCheckers. See > https://issues.apache.org/jira/browse/SOLR-2993 , which adds a new > spellchecker that corrects word break problems. This added a new class, > ConjunctionSolrSpellChecker that interleaves the results from the main > String-Distance-based checker with results from the word break checker. > You might be able to generalize this class to also be able to combine > results from multiple DirectSolrSpellCheckers together. While you want to > get suggestions from multiple cores, others might want this feature to be > able to have separate dictionaries per-field from the same core. > > I think its ok to rank combined results by String Distance so long as you > knew the same metric was applied to all. This is in constrast to how it is > with the Word Break spellchecker which uses an incompatible distance > metric. So for this case, ConjunctionSolrSpellChecker just interleaves the > results round-robin. > > So expanding on ConjunctionSolrSpellChecker might be one possible way to > accomplish what you want to do. You might find something else that works > better. For whatever you come up with, by all means open a JIRA issue and > attach your work as a patch and see where it goes from there. (subscribe > to the dev list if you haven't already as that's where these type of > discussions usually happen). > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -----Original Message----- > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > Sent: Friday, December 14, 2012 11:06 AM > To: solr-user@lucene.apache.org > Subject: Re: Differentiate between correctly spelled term and mis-spelled > term with no corrections > > Hi James, > > Couple more follow up questions - > > 1. Do changes to the response format have to be backwards compatible at > this point? Seems like if we changed it to always return the origFreq even > if there are no suggestions then that could break things right? > 2. For our purposes, we need to be able to order suggestions from multiple > Solr cores so we were thinking of changing the format to also include the > score that is calculated for each suggestion (which isn't exposed right > now). Are these scores from different dictionary fields comparable > (assuming we use the default INTERNAL_LEVENSHTEIN_DISTANCE metric)? And do > you think this would be of general use i.e. could it be contributed back to > Solr? > > Thanks, > Nalini > > > On Fri, Dec 7, 2012 at 2:20 PM, Nalini Kartha <nalinikar...@gmail.com > >wrote: > > > Ah I see what you mean. Will probably try to change the response to look > > like the internal shard one then. > > > > Thanks for the detailed explanation! > > > > - Nalini > > > > > > On Fri, Dec 7, 2012 at 1:38 PM, Dyer, James < > james.d...@ingramcontent.com>wrote: > > > >> The response from the shards is different from the final spellcheck > >> response in that it does include the term even if there are no > suggestions > >> for it. So to get the behavior you want, we'd probably just have to > make > >> it so you could get the "shard-to-shard-internal" version. > >> > >> See > >> > http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/SpellCheckComponent.java > >> > >> ...and method "toNamedList(...)" > >> > >> ...and this line: > >> > >> if (theSuggestions != null && (theSuggestions.size() > 0 || > >> shardRequest)) { > >> ... > >> } > >> > >> ...the "shardRequest" boolean is passed with "true" here if its the 1st > >> stage of a distributed request (from #process). The various shards send > >> their responses to the main shard which then integrates them together > (in > >> #finishStage) Note that #finishStage always passes > "shardRequest=false" to > >> #toNamedList so that the end user gets a "normal" response back, > omitting > >> terms for which there are no suggestions. > >> > >> James Dyer > >> E-Commerce Systems > >> Ingram Content Group > >> (615) 213-4311 > >> > >> > >> -----Original Message----- > >> From: Nalini Kartha [mailto:nalinikar...@gmail.com] > >> Sent: Friday, December 07, 2012 9:54 AM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Differentiate between correctly spelled term and > mis-spelled > >> term with no corrections > >> > >> Hi James, > >> > >> Thanks for the response, will open a JIRA for this. > >> > >> Had one follow-up question - how does the Distributed > SpellCheckComponent > >> handle this? I tried looking at the code but it's not obvious to me how > it > >> is able to differentiate between these 2 cases. I see that it only > >> considers a term to be wrongly spelt if all shards return a suggestion > for > >> it but isn't it possible that a suggestion is not returned because > nothing > >> close enough could be found in some shard? Or is the response from > shards > >> different than the final spellcheck response we get from Solr in some > way? > >> > >> Thanks, > >> Nalini > >> > >> > >> On Fri, Dec 7, 2012 at 10:26 AM, Dyer, James > >> <james.d...@ingramcontent.com>wrote: > >> > >> > You might want to open a jira issue for this to request that the > feature > >> > be added. If you haven't used it before, you need to create an > account. > >> > > >> > https://issues.apache.org/jira/browse/SOLR > >> > > >> > In the mean time, If you need to get the document frequency of the > query > >> > terms, see http://wiki.apache.org/solr/TermsComponent , which maybe > >> would > >> > provide you a viable workaround. > >> > > >> > James Dyer > >> > E-Commerce Systems > >> > Ingram Content Group > >> > (615) 213-4311 > >> > > >> > > >> > -----Original Message----- > >> > From: Nalini Kartha [mailto:nalinikar...@gmail.com] > >> > Sent: Thursday, December 06, 2012 2:44 PM > >> > To: solr-user@lucene.apache.org > >> > Subject: Differentiate between correctly spelled term and mis-spelled > >> term > >> > with no corrections > >> > > >> > Hi, > >> > > >> > When using the SolrSpellChecker, is there currently any way to > >> > differentiate between a term that exists in the dictionary and a > >> > mis-spelled term for which no corrections were found when looking at > the > >> > spellcheck response? > >> > > >> > From reading the doc and trying out some simple test cases it seems > like > >> > there isn't - in both cases it looks like the response doesn't include > >> the > >> > term. > >> > > >> > Could the extended results format be changed to include the original > >> term > >> > frequency even if there are no suggestions? This would allow us to > make > >> > this differentiation. > >> > > >> > Thanks, > >> > Nalini > >> > > >> > > >> > >> > > > >