Thanks! I'll give more effort to understand your suggestion & that Norm thing.

----- Original Message ----- From: "MitchK" <mitc...@web.de>
To: <solr-user@lucene.apache.org>
Sent: Tuesday, August 24, 2010 5:28 AM
Subject: Re: Doing Shingle but also keep special single word



No, I mean that you use an additional field (indexed) for searching (i.e.
whitespace-tokenized, so every word - seperated by a whitespace - becomes to
a token .
So you have got two fields (shingle-token-field and single-token-field).
So you can search accross both fields.
This provides several benefits: i.e. you can boost the shingle-field at
query-time, since a match in a shingle-field would mean, that there matches
an exact phrase.

Additionally: You can search with single-word-queries as well as
multi-word-queries.
Furthermore you can apply synonyms to your single-token-field.

If you want to keep your index as small as possible but as large as needed,
try to understand Lucene's similarity implementation to consider, whether
you can set the field option "omitNorms"=true or
omitTermFreqAndPositions="true".
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/Similarity.html
Keep in mind what happens, if you omit one of those options.

A small example of the consequences of setting omitNorms = true;.
doc1: "this is a short example doc"
doc2: "this is a longer example doc for presenting the effect of omitNorms"

If you are searching for "doc" while omitNorms=false your response will look
like this:
doc1,
doc2
This is because the norm-value for doc1 is smaller as the norm-value for
doc2, because doc1 is shorter than doc2 (have a look at the provided link).

If omitNorms=true, the scores for both docs will be equal.

Kind regards,
- Mitch


scott chu wrote:

I don't quite understand additional-field-way? Do you mean making another
field that stores special words particularly but no indexing for that
field?

Scott

----- Original Message ----- From: "MitchK" <mitc...@web.de>
To: <solr-user@lucene.apache.org>
Sent: Sunday, August 22, 2010 11:48 PM
Subject: Re: Doing Shingle but also keep special single word



Hi,

keepword-filter is no solution for this problem, since this would lead to
the problematic that one has to manage a word-dictionary. As explained,
this
would lead to too much effort.

You can easily add outputUnigrams=true and check out the analysis.jsp for
this field. So you can see how much bigger a single field will become
with
this option.
However, I am quite sure that the difference between using
outputUnigrams=true and indexing in a seperate field is not noteworthy.

I would suggest you to do it the additionally-field-way, since this would
lead to more flexibility in boosting the different fields.

Unfortunately, I haven't understood your explanation about the use-case.
But
it sounds a little bit like tagging?

Kind regards,
- Mitch


iorixxx wrote:

Isn't set outputUnigrams="true" will
make index size about twice than when it's set to false?

Sure index will be bigger. I didn't know that this is problem for you.
But
if you have a list of special single words that you want to keep,
keepwordfilter can eliminate other tokens. So index size will be okey.


Scott

----- Original Message ----- From: "Ahmet Arslan" <iori...@yahoo.com>
To: <solr-user@lucene.apache.org>
Sent: Saturday, August 21, 2010 1:15 AM
Subject: Re: Doing Shingle but also keep special single
word


>> I am building index with Shingle
>> filter. We know it's minimum 2-gram but I also
want keep
>> some special single word, e.g. IBM, Microsoft,
etc. i.e. I
>> want to do a minimum 2-gram but also want to have
these
>> single word in my index, Is it possible?
>
> outputUnigrams="true" parameter does not work for
you?
>
> After that you can cast <filter
class="solr.KeepWordFilterFactory" words="keepwords.txt"
ignoreCase="true"/> with keepwords.txt=IBM, Microsoft.
>
>
>
>







--
View this message in context:
http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
Sent from the Solr - User mailing list archive at Nabble.com.



--------------------------------------------------------------------------------



¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10
14:35:00



--
View this message in context: http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1300497.html
Sent from the Solr - User mailing list archive at Nabble.com.



--------------------------------------------------------------------------------



___b___J_T_________f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3090 - Release Date: 08/24/10 02:34:00

Reply via email to