Re: Adding Phonetic Search to Solr

Chris Hostetter Wed, 08 Nov 2006 10:30:36 -0800

: A naming convention question: should the class names end in
: Filter or TokenFilter (and FilterFactory or TokenFilterFactory)?
: I see both in org.apache.solr.analysis.


Ummm....  "yes"  :)

I don't think it makes a big difference ... i'd never noticed the
inconsistency untill now.

: I'm a bit disappointed in the performance, though. It is half the
: speed when adding two phonetic fields to search. Dropped from 300

: Could that be from searching extra fields? Indexing is the same
: speed, so it shouldn't be the DoubleMetaphone class. I'm still
: trying to get a feel for Lucene performance after years with the
: Ultraseek engine.

Your indexing speed may already be limited by something else, so you might
not notice lags in the DoubleMetaphone class at index time ... have you
tried some micro benchmarks on the dm.encode method to see how long it
takes per token?

: Also, the phonetic matches are ranked a bit high, so I'm trying a
: sub-1.0 boost. I was expecting the lower idf to fix that automatically.
: The metaphone will almost always have a lower idf because multiple
: words are mapped to one metaphone, so the encoded term occurs in more
: documents than the surface terms.

That all makes sense, and yet it's not what you are observing ... which
leads me to believe you (and I since i want to agree with you) are missing
something subtle .... what does the the Explanation look like for two
documenets where you feel like one should score higher then the other but
they don't?


-Hoss

Re: Adding Phonetic Search to Solr

Reply via email to