: A naming convention question: should the class names end in : Filter or TokenFilter (and FilterFactory or TokenFilterFactory)? : I see both in org.apache.solr.analysis.
Ummm.... "yes" :) I don't think it makes a big difference ... i'd never noticed the inconsistency untill now. : I'm a bit disappointed in the performance, though. It is half the : speed when adding two phonetic fields to search. Dropped from 300 : Could that be from searching extra fields? Indexing is the same : speed, so it shouldn't be the DoubleMetaphone class. I'm still : trying to get a feel for Lucene performance after years with the : Ultraseek engine. Your indexing speed may already be limited by something else, so you might not notice lags in the DoubleMetaphone class at index time ... have you tried some micro benchmarks on the dm.encode method to see how long it takes per token? : Also, the phonetic matches are ranked a bit high, so I'm trying a : sub-1.0 boost. I was expecting the lower idf to fix that automatically. : The metaphone will almost always have a lower idf because multiple : words are mapped to one metaphone, so the encoded term occurs in more : documents than the surface terms. That all makes sense, and yet it's not what you are observing ... which leads me to believe you (and I since i want to agree with you) are missing something subtle .... what does the the Explanation look like for two documenets where you feel like one should score higher then the other but they don't? -Hoss
