You're welcome. I should have pointed out that I was responding
mostly to the "false hits are not acceptable" portion, which I don't
think is achievable....

Best
Erick

2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>

> Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez Erick Erickson:
>
>  Well, let me see. Your customers are telling you, in essence,
>> "for any random input, you cannot return false positives". Which
>> is nonsense, so I'd say you need to negotiate with your
>> customers. I flat guarantee that, for any algorithm you try,
>> you can write a counter-example in, oh, 15 seconds or so <G>.
>>
>
> They came to such expectations seeing Solr's own Spellcheck at work - if it
> can suggest correct versions, it should be able to sanitize broken words in
> documents and search them using sanitized input. For me, this seemed
> reasonable request (of course, if this can be achieved reasonably abusing
> solr's spellcheck component).
>
>  FuzzySearch tries to do some of this work for you, and that may be
>> acceptable, as this is a common issue. But it'll never be
>> perfect.
>>
>> You might get some joy from ngrams, but I haven't
>> worked with it myself, just seen it recommended by people
>> whose opinions I respect...
>>
>
> Thank you for these suggestions.
>
>
>
>>
>> Best
>> Erick
>>
>>
>> 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>
>>
>>  Hello, group.
>>>
>>> I'm trying to create a search facility for documents in "broken" Polish
>>> (by
>>> broken I mean "not language rules compliant"), searchable by terms in
>>> "broken" Polish, but broken in many other ways than documents. See this
>>> example:
>>>
>>> document text: "włatcy móch" (in proper Polish this would be "władcy
>>> much")
>>> example terms that should match: "włatcy much", "wlatcy moch", "wladcy
>>> much"
>>>
>>> This double brokeness ruled out any Polish stemmers currently available
>>> for
>>> Lucene and now I am at point 0. The search results do not have to be 100%
>>> accurate - some missing results are acceptable, but "false positives" are
>>> not. Is it at all possible using machinery provided by Solr (I do not own
>>> PHD in liguistics), or should I ask the business for lowering their
>>> expectations?
>>>
>>> --
>>> We read Knuth so you don't have to. - Tim Peters
>>>
>>> Jarek Zgoda, R&D, Redefine
>>> [EMAIL PROTECTED]
>>>
>>>
>>>
> --
> We read Knuth so you don't have to. - Tim Peters
>
> Jarek Zgoda, R&D, Redefine
> [EMAIL PROTECTED]
>
>

Reply via email to