Re: Differentiate between correctly spelled term and mis-spelled term with no corrections

Nalini Kartha Tue, 18 Dec 2012 09:03:18 -0800

Got it. Thanks again for all the info! Will open a JIRA and follow up about
this sometime soon.


Thanks,
Nalini


On Fri, Dec 14, 2012 at 1:32 PM, Dyer, James
<james.d...@ingramcontent.com>wrote:

> Nalini,
>
> I don't think you can change the *default* response format until a new
> major release (so its ok for Trunk/5.0 but not for the 4.x branch).  What
> you can do, however, is create a new "spellcheck.xxx" parameter to let
> users opt-in to the new functionality in 4.x as desired.  We'd also want to
> update solrj so java clients could easily use the new feature (see
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/solrj/src/java/org/apache/solr/client/solrj/response/SpellCheckResponse.java
> ).
>
> I'm not sure I ever heard someone wanting to combine suggestions from
> multiple cores before.  I'd be interested in hearing more about what you're
> trying to do.  But this does seem similar to the problem of combining
> suggestions between multiple SpellCheckers.  See
> https://issues.apache.org/jira/browse/SOLR-2993 , which adds a new
> spellchecker that corrects word break problems.  This added a new class,
> ConjunctionSolrSpellChecker that interleaves the results from the main
> String-Distance-based checker with results from the word break checker.
>  You might be able to generalize this class to also be able to combine
> results from multiple DirectSolrSpellCheckers together.  While you want to
> get suggestions from multiple cores, others might want this feature to be
> able to have separate dictionaries per-field from the same core.
>
> I think its ok to rank combined results by String Distance so long as you
> knew the same metric was applied to all.  This is in constrast to how it is
> with the Word Break spellchecker which uses an incompatible distance
> metric.  So for this case, ConjunctionSolrSpellChecker just interleaves the
> results round-robin.
>
> So expanding on ConjunctionSolrSpellChecker might be one possible way to
> accomplish what you want to do.  You might find something else that works
> better. For whatever you come up with, by all means open a JIRA issue and
> attach your work as a patch and see where it goes from there.  (subscribe
> to the dev list if you haven't already as that's where these type of
> discussions usually happen).
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
> -----Original Message-----
> From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> Sent: Friday, December 14, 2012 11:06 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Differentiate between correctly spelled term and mis-spelled
> term with no corrections
>
> Hi James,
>
> Couple more follow up questions -
>
> 1. Do changes to the response format have to be backwards compatible at
> this point? Seems like if we changed it to always return the origFreq even
> if there are no suggestions then that could break things right?
> 2. For our purposes, we need to be able to order suggestions from multiple
> Solr cores so we were thinking of changing the format to also include the
> score that is calculated for each suggestion (which isn't exposed right
> now). Are these scores from different dictionary fields comparable
> (assuming we use the default INTERNAL_LEVENSHTEIN_DISTANCE metric)? And do
> you think this would be of general use i.e. could it be contributed back to
> Solr?
>
> Thanks,
> Nalini
>
>
> On Fri, Dec 7, 2012 at 2:20 PM, Nalini Kartha <nalinikar...@gmail.com
> >wrote:
>
> > Ah I see what you mean. Will probably try to change the response to look
> > like the internal shard one then.
> >
> > Thanks for the detailed explanation!
> >
> > - Nalini
> >
> >
> > On Fri, Dec 7, 2012 at 1:38 PM, Dyer, James <
> james.d...@ingramcontent.com>wrote:
> >
> >> The response from the shards is different from the final spellcheck
> >> response in that it does include the term even if there are no
> suggestions
> >> for it.  So to get the behavior you want, we'd probably just have to
> make
> >> it so you could get the "shard-to-shard-internal" version.
> >>
> >> See
> >>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
> >>
> >> ...and method "toNamedList(...)"
> >>
> >> ...and this line:
> >>
> >> if (theSuggestions != null && (theSuggestions.size() > 0 ||
> >> shardRequest)) {
> >> ...
> >> }
> >>
> >> ...the "shardRequest" boolean is passed with "true" here if its the 1st
> >> stage of a distributed request (from #process).  The various shards send
> >> their responses to the main shard which then integrates them together
> (in
> >> #finishStage)  Note that #finishStage always passes
> "shardRequest=false" to
> >> #toNamedList so that the end user gets a "normal" response back,
> omitting
> >> terms for which there are no suggestions.
> >>
> >> James Dyer
> >> E-Commerce Systems
> >> Ingram Content Group
> >> (615) 213-4311
> >>
> >>
> >> -----Original Message-----
> >> From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> >> Sent: Friday, December 07, 2012 9:54 AM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Differentiate between correctly spelled term and
> mis-spelled
> >> term with no corrections
> >>
> >> Hi James,
> >>
> >> Thanks for the response, will open a JIRA for this.
> >>
> >> Had one follow-up question - how does the Distributed
> SpellCheckComponent
> >> handle this? I tried looking at the code but it's not obvious to me how
> it
> >> is able to differentiate between these 2 cases. I see that it only
> >> considers a term to be wrongly spelt if all shards return a suggestion
> for
> >> it but isn't it possible that a suggestion is not returned because
> nothing
> >> close enough could be found in some shard? Or is the response from
> shards
> >> different than the final spellcheck response we get from Solr in some
> way?
> >>
> >> Thanks,
> >> Nalini
> >>
> >>
> >> On Fri, Dec 7, 2012 at 10:26 AM, Dyer, James
> >> <james.d...@ingramcontent.com>wrote:
> >>
> >> > You might want to open a jira issue for this to request that the
> feature
> >> > be added.  If you haven't used it before, you need to create an
> account.
> >> >
> >> > https://issues.apache.org/jira/browse/SOLR
> >> >
> >> > In the mean time, If you need to get the document frequency of the
> query
> >> > terms, see http://wiki.apache.org/solr/TermsComponent , which maybe
> >> would
> >> > provide you a viable workaround.
> >> >
> >> > James Dyer
> >> > E-Commerce Systems
> >> > Ingram Content Group
> >> > (615) 213-4311
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: Nalini Kartha [mailto:nalinikar...@gmail.com]
> >> > Sent: Thursday, December 06, 2012 2:44 PM
> >> > To: solr-user@lucene.apache.org
> >> > Subject: Differentiate between correctly spelled term and mis-spelled
> >> term
> >> > with no corrections
> >> >
> >> > Hi,
> >> >
> >> > When using the SolrSpellChecker, is there currently any way to
> >> > differentiate between a term that exists in the dictionary and a
> >> > mis-spelled term for which no corrections were found when looking at
> the
> >> > spellcheck response?
> >> >
> >> > From reading the doc and trying out some simple test cases it seems
> like
> >> > there isn't - in both cases it looks like the response doesn't include
> >> the
> >> > term.
> >> >
> >> > Could the extended results format be changed to include the original
> >> term
> >> > frequency even if there are no suggestions? This would allow us to
> make
> >> > this differentiation.
> >> >
> >> > Thanks,
> >> > Nalini
> >> >
> >> >
> >>
> >>
> >
>
>

Re: Differentiate between correctly spelled term and mis-spelled term with no corrections

Reply via email to