Hi Swarag,

Indeed, we were faced with a problem with what we called Hiearchy synonym
search, I think it is a little different of what you are looking for, but
who know, maybe it could lead you to find a solution for you problem too.

So here was our need:
Let's say we have this hierarchy of words

              +---- Jazz
              |  
Modern music -+---- Rock
              |
              +---- Hip Hop


So the term "Modern music" includes lots of music style, in our case, Jazz,
Rock and Hip Hop.

We need a search that behaves that way:

 - If a user searches for Jazz, it should return any document containing
only Jazz, but not documents containing Rock, Hip Hop or Modern Music
 - If a user searches for Modern Music, it should return any document in the
hierarchy, including of course document containing Modern Music.

To be able to do this, we keep our logic of putting identifier to synonyms,
but it wasn't enough to achieve this goal, so we ends up making a field
specifically for hierarchy search, with a synonym filter pointing to two
different files for index & query time.


Here is the file for index time:
jazz => HIERARCHY_1
rock => HIERARCHY_2
hip hop => HIERARCHY_3
 
Here is the file for query time:
Jazz, modern music => HIERARCHY_1
rock, modern music => HIERARCHY_2
hip hop, modern music => HIERARCHY_3


with the following schema configuration 
<fieldtype name="string_hier" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
(...)
<filter class="solr.SynonymFilterFactory" synonyms="hierarchies.index.txt"
ignoreCase="true" expand="false" />
(...)
</analyzer>
<analyzer type="query">
(...)
<filter class="solr.SynonymFilterFactory" synonyms="hierarchies.query.txt"
ignoreCase="true" expand="false" />
(...)
</analyzer>
</fieldtype>


This way, a document containing "jazz", "rock" or "hip hop", will be indexed
respectively with "HIERARCHY_1", "HIERARCHY_2" and "HIERARCHY_3".
A document containing "modern music" keep having "modern music" since it is
not matched at index time.

User searches :
Case 1
If a user searches for "I love jazz", the query time parser for *_hier field
type will transform the query into "I love HIERARCHY_1"
And will match ONLY document containing jazz

Case 2 
If a user searches for "I love modern music", the query time parser for
*_hier field type will transform the query into
"I love [HIERARCHY_1 | HIERARCHY_2 | HIERARCHY_3]"

So it is in fact looking for the whole hierarchy, the only exception is
about "modern music" which is not matched here, but will be matched if we do
in parallel a search on another full-text-stemmed field type.

Here was our need and solution, hope this can help you to find a solution
for you.
Regards,
Laurent


-----Message d'origine-----
De : swarag [mailto:[EMAIL PROTECTED] 
Envoyé : mardi 29 juillet 2008 04:08
À : solr-user@lucene.apache.org
Objet : RE: solr synonyms behaviour


Hi Laurent


Laurent Gilles wrote:
> 
> Hi,
> 
> I was faced with the same issues reguarding multiwords synonyms
> Let's say a synonyms list like:
> 
> club, bar, night cabaret
> 
> Now if we have a document containing "club", with the default synonyms
> filter behaviour with expand=true, we will end up in the lucene index with
> a
> document containing "club|bar|night cabaret".
> So if the user search for "night", the query-time will search for "night"
> in
> the index and will match our document since it had been "enriched" @
> index-time, and it really contains the token "night".
> 
> The only valid solution I've founded was to create a field-type
> exclusively
> used for synonyms search where: 
> 
> @IndexTime
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false" />
> @QueryTime
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false" />
> 
> And with a customised synonyms file that looks like:
> 
> SYN_ID_1, club, bar, night cabaret
> 
> So for our document containing "club", the synonym filter at index time
> with
> expand=false will replace every matching token/expression in the document
> with the SYN_ID_1.
> 
> And at query time, when an user search for "night", since "night" is not
> alone in synonyms definition, it will not be matched, even by "normal"
> search, because every document containing "club" or "bar" would have been
> "enriched" with "SYN_ID_1" and NOT with "club|bar|night cabaret", so the
> final indexed document will not contains isolated token from synonyms
> expression that risks to be matched later without notice.
> 
> In order to match our document containing "club", the user HAVE TO type
> the
> entire expression "night cabaret", and not only part of the expression.
> 
> 
> Of course, as I said before, this field was exclusively used for synonym
> matching, so it requires another field for normal full-text-stemmed search
> to add normal results, this approach give us the opportunity to setup
> Boosting separately on full-text-stemmed search VS synonyms search, let's
> say :
> 
> "title_stem":"club"^100 OR "title_syns":"club"^10
> 
> I hope to have been clear, even if I don’t believe to.. Fact is this
> approach have fixed your problem, since we didn't what synonym matching if
> the user only types part of synonymic expression.
> 
> Regards,
> Laurent
> 
> 

This has seemed to solve our problem. Thank you very much for your help. 
Once we have our environment setup and all of our data indexed, it may even
provide an extra 'bonus' to be able to add different weights/boosts for the
different fields.

Now, not to be too greedy, but I am wondering if there is a way to utilize
this technique for "Explicit synonym matching" (i.e. synonym mappings that
use the '=>' operator).  For example, we may have a couple mappings like the
following:
night club=>club, bar
swim club=>club, team

As you can see, both night clubs and swim clubs are clubs, but are not
necessarily equivalent with the term "club".  It would be nice to be able to
search for "night club" and only see results for "clubs" and "bars", but not
necessarily "teams", which otherwise, would show up in the results if we use
Equivalent synonyms.

Just wondering if you have been able to do this as well.

Again, thank you for your help!

-- 
View this message in context:
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18703520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to