SynonymTokenFilter, if I understand correctly, maps a given token to a
set of tokens representing its synonyms. If used in the filter chain
of a query analyzer, it causes a "query expansion". (Correct
terminology?) If used in the filter chain of an analyzer it causes
"index expansion".

I was wondering whether anyone has implemented a synonym filter that
instead of mapping tokens to their synonyms, maps tokens to their
"synonym-groups". Again, I'm not sure this is correct IR terminology,
but borrowing from the SynonymMap implementation, what I mean by a
"synonym-group" is a set words that are considered synonyms. If a word
can have different [contextual] meanings, then it would be a member of
multiple synonym-groups.

The idea here is to minimize the index/query "expansion" by observing
that the number of synonym-groups a word belongs to would typically be
far fewer than the number of its synonyms. Each synonym-group would be
represented by a specially unique term in the index. Unlike
SynonymTokenFilter, the filter would have to be used in both the
indexer and query analyzer.

This is not a new idea. See the comments in LUCENE-1622 (a tangential
topic), for example. Has anyone contributed an implementation?

-Babak

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to