[ 
https://issues.apache.org/jira/browse/SOLR-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-813:
-------------------------------

    Attachment: SOLR-813.patch

Here is an update that adresses two concerns:
1. position increments -- this keeps the tokens in sync with the input
2. previous version would stop processing after a number.  That is: "aaa 1234 
bbb" would not process "bbb"
3. Token types... this changes it to "DoubleMetaphone" rather then "ALPHANUM"

here is the key part:
{code:java}
      boolean isPhonetic = false;
      String v = new String(t.termBuffer(), 0, t.termLength());
      String primaryPhoneticValue = encoder.doubleMetaphone(v);
      if (primaryPhoneticValue.length() > 0) {
        Token token = (Token) t.clone();
        if( inject ) {
          token.setPositionIncrement( 0 );
        }
        token.setType( TOKEN_TYPE );
        token.setTermBuffer(primaryPhoneticValue);
        remainingTokens.addLast(token);
        isPhonetic = true;
      }

      String alternatePhoneticValue = encoder.doubleMetaphone(v, true);
      if (alternatePhoneticValue.length() > 0
          && !primaryPhoneticValue.equals(alternatePhoneticValue)) {
        Token token = (Token) t.clone();
        token.setPositionIncrement( 0 );
        token.setType( TOKEN_TYPE );
        token.setTermBuffer(alternatePhoneticValue);
        remainingTokens.addLast(token);
        isPhonetic = true;
      }
      
      // If we did not add something, then go to the next one...
      if( !isPhonetic ) {
        t = next(in);
        t.setPositionIncrement( t.getPositionIncrement()+1 ); 
        return t;
      }
{code}

> Add new DoubleMetaphone Filter and Factory
> ------------------------------------------
>
>                 Key: SOLR-813
>                 URL: https://issues.apache.org/jira/browse/SOLR-813
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: SOLR-813.patch, SOLR-813.patch
>
>
> The existing PhoneticFilter allows for use of the DoubleMetaphone encoder. 
> However, it doesn't expose the maxCodeLength() setting, and it ignores the 
> alternate encodings that the encoder provides for some words. This new filter 
> is not as generic as the PhoneticFilter, but allows more detailed control 
> over the DoubleMetaphone encoder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to