[
https://issues.apache.org/jira/browse/SOLR-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan McKinley updated SOLR-813:
-------------------------------
Attachment: SOLR-813.patch
Here is an update that adresses two concerns:
1. position increments -- this keeps the tokens in sync with the input
2. previous version would stop processing after a number. That is: "aaa 1234
bbb" would not process "bbb"
3. Token types... this changes it to "DoubleMetaphone" rather then "ALPHANUM"
here is the key part:
{code:java}
boolean isPhonetic = false;
String v = new String(t.termBuffer(), 0, t.termLength());
String primaryPhoneticValue = encoder.doubleMetaphone(v);
if (primaryPhoneticValue.length() > 0) {
Token token = (Token) t.clone();
if( inject ) {
token.setPositionIncrement( 0 );
}
token.setType( TOKEN_TYPE );
token.setTermBuffer(primaryPhoneticValue);
remainingTokens.addLast(token);
isPhonetic = true;
}
String alternatePhoneticValue = encoder.doubleMetaphone(v, true);
if (alternatePhoneticValue.length() > 0
&& !primaryPhoneticValue.equals(alternatePhoneticValue)) {
Token token = (Token) t.clone();
token.setPositionIncrement( 0 );
token.setType( TOKEN_TYPE );
token.setTermBuffer(alternatePhoneticValue);
remainingTokens.addLast(token);
isPhonetic = true;
}
// If we did not add something, then go to the next one...
if( !isPhonetic ) {
t = next(in);
t.setPositionIncrement( t.getPositionIncrement()+1 );
return t;
}
{code}
> Add new DoubleMetaphone Filter and Factory
> ------------------------------------------
>
> Key: SOLR-813
> URL: https://issues.apache.org/jira/browse/SOLR-813
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.3
> Reporter: Todd Feak
> Priority: Minor
> Attachments: SOLR-813.patch, SOLR-813.patch
>
>
> The existing PhoneticFilter allows for use of the DoubleMetaphone encoder.
> However, it doesn't expose the maxCodeLength() setting, and it ignores the
> alternate encodings that the encoder provides for some words. This new filter
> is not as generic as the PhoneticFilter, but allows more detailed control
> over the DoubleMetaphone encoder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.