You're welcome. I should have pointed out that I was responding mostly to the "false hits are not acceptable" portion, which I don't think is achievable....
Best Erick 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> > Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez Erick Erickson: > > Well, let me see. Your customers are telling you, in essence, >> "for any random input, you cannot return false positives". Which >> is nonsense, so I'd say you need to negotiate with your >> customers. I flat guarantee that, for any algorithm you try, >> you can write a counter-example in, oh, 15 seconds or so <G>. >> > > They came to such expectations seeing Solr's own Spellcheck at work - if it > can suggest correct versions, it should be able to sanitize broken words in > documents and search them using sanitized input. For me, this seemed > reasonable request (of course, if this can be achieved reasonably abusing > solr's spellcheck component). > > FuzzySearch tries to do some of this work for you, and that may be >> acceptable, as this is a common issue. But it'll never be >> perfect. >> >> You might get some joy from ngrams, but I haven't >> worked with it myself, just seen it recommended by people >> whose opinions I respect... >> > > Thank you for these suggestions. > > > >> >> Best >> Erick >> >> >> 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> >> >> Hello, group. >>> >>> I'm trying to create a search facility for documents in "broken" Polish >>> (by >>> broken I mean "not language rules compliant"), searchable by terms in >>> "broken" Polish, but broken in many other ways than documents. See this >>> example: >>> >>> document text: "włatcy móch" (in proper Polish this would be "władcy >>> much") >>> example terms that should match: "włatcy much", "wlatcy moch", "wladcy >>> much" >>> >>> This double brokeness ruled out any Polish stemmers currently available >>> for >>> Lucene and now I am at point 0. The search results do not have to be 100% >>> accurate - some missing results are acceptable, but "false positives" are >>> not. Is it at all possible using machinery provided by Solr (I do not own >>> PHD in liguistics), or should I ask the business for lowering their >>> expectations? >>> >>> -- >>> We read Knuth so you don't have to. - Tim Peters >>> >>> Jarek Zgoda, R&D, Redefine >>> [EMAIL PROTECTED] >>> >>> >>> > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > [EMAIL PROTECTED] > >