[jira] Updated: (SOLR-1710) convert worddelimiterfilter to new tokenstream API

Robert Muir (JIRA) Fri, 08 Jan 2010 14:26:17 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated SOLR-1710:
------------------------------

    Attachment: SOLR-1710.patch

for the 'wdf is only modifying single word with punctuation', don't 
clearAttributes() if its the first token, even though its modified... unless 
preserveOriginal is on (in this case the preserved original contained the 
attributes already, and we must clear).

this is a little confusing since the behavior for custom attributes depends on 
this preserveOriginal value, but i think it makes sense.

> convert worddelimiterfilter to new tokenstream API
> --------------------------------------------------
>
>                 Key: SOLR-1710
>                 URL: https://issues.apache.org/jira/browse/SOLR-1710
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Robert Muir
>         Attachments: SOLR-1710.patch, SOLR-1710.patch
>
>
> This one was a doozy, attached is a patch to convert it to the new 
> tokenstream API.
> Some of the logic was split into WordDelimiterIterator (exposes a 
> BreakIterator-like api for iterating subwords)
> the filter is much more efficient now, no cloning.
> before applying the patch, rename the existing WordDelimiterFilter to 
> OriginalWordDelimiterFilter
> the patch includes a testcase (TestWordDelimiterBWComp) which generates 
> random strings from various subword combinations.
> For each random string, it compares output against the existing 
> WordDelimiterFilter for all 512 combinations of boolean parameters.
> NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these 
> combinations. The bugs discovered in SOLR-1706 are fixed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1710) convert worddelimiterfilter to new tokenstream API

Reply via email to