You are correct.
I would recommend to only use the Synonym TokenFilter at index time
unless you have a very good reason to do it at query time.
On 10/05/2009 11:46 AM, darniz wrote:
yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke => bmw
and i have a document which has bmw in it, searching for text:bayrische does
not give me results. i have to give
text:"bayrische motoren werke" then it actually takes the synonym and gets
me the document.
Now if i change the synonym mapping to
bayrische motoren werke , bmw with expand parameter to true and also use
this file at indexing.
now at the time i index this document along with "bmw" i also index the
following words "bayrische" "motoren" "werke"
any text query like text:motoren or text:bayrische will give me results now.
Please correct me if my assumption is wrong.
Thanks
darniz
Christian Zambrano wrote:
On 10/02/2009 06:02 PM, darniz wrote:
Thanks
As i said it even works by giving double quotes too.
like carDescription:"austin martin"
So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words
Yes, but there are things you need to keep in mind.
From the solr wiki:
Keep in mind that while the SynonymFilter will happily work with
*synonyms* containing multiple words (ie:
"sea biscuit, sea biscit, seabiscuit") The recommended approach for
dealing with *synonyms* like this, is to expand the synonym when
indexing. This is because there are two potential issues that can arrise
at query time:
1.
The Lucene QueryParser tokenizes on white space before giving any
text to the Analyzer, so if a person searches for the words
sea biscit the analyzer will be given the words "sea" and "biscit"
seperately, and will not know that they match a synonym.
2.
Phrase searching (ie: "sea biscit") will cause the QueryParser to
pass the entire string to the analyzer, but if the SynonymFilter
is configured to expand the *synonyms*, then when the QueryParser
gets the resulting list of tokens back from the Analyzer, it will
construct a MultiPhraseQuery that will not have the desired
effect. This is because of the limited mechanism available for the
Analyzer to indicate that two terms occupy the same position:
there is no way to indicate that a "phrase" occupies the same
position as a term. For our example the resulting MultiPhraseQuery
would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
not match the simple case of "seabisuit" occuring in a document
Christian Zambrano wrote:
When you use a field qualifier(fieldName:valueToLookFor) it only applies
to the word right after the semicolon. If you look at the debug
infomation you will notice that for the second word it is using the
default field.
<str name="parsedquery_toString">carDescription:austin
*text*:martin</str>
the following should word:
carDescription:(austin martin)
On 10/02/2009 05:46 PM, darniz wrote:
This is not working when i search documents i have a document which
contains
text aston martin
when i search carDescription:"austin martin" i get a match but when i
dont
give double quotes
like carDescription:austin martin
there is no match
in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase "austin martin" but
when
i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the
debug
node info with debugQuery=on
<str name="rawquerystring">carDescription:austin martin</str>
<str name="querystring">carDescription:austin martin</str>
<str name="parsedquery">carDescription:austin text:martin</str>
<str name="parsedquery_toString">carDescription:austin
text:martin</str>
dont know why it breaks the word, may be its a desired behaviour
when i give carDescription:"austin martin" of course in this its able
to
map
to synonym and i get the desired result
Any opinion
darniz
Ensdorf Ken wrote:
Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin => aston martin
...
Can anybody please explain if my observation is correct. This is a
very
critical aspect for my work.
That is correct - the synonym filter can recognize multi-token
synonyms
from consecutive tokens in a stream.