Thank you for your reply
How can i change the defaultSimilarity in the indexing and the searching, do
you have an example or an url how to set the Similarity ?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html
Thanks again
Ion
Try too look at Similarity, there you will find thinks about the
scoring. Your query is more similar with the shorter document.
If you have 2 documents with a field body; first with words red flower
and the second with just one word flower, and search for the word
flower, the second document
Hi
Iam having problem using Field.TermVector, i dont know how to use it. does some
one have an exemple or an address, how to use the termVector in indexing an
searching?
doc.add(new Field(title, httpd.getTiltle(), Field.Store.YES,
Field.TermVector));
Thanks
On Nov 2, 2007, at 4:37 AM, Jamal H Tandina wrote:
Hi
Iam having problem using Field.TermVector, i dont know how to use
it. does some one have an exemple or an address, how to use the
termVector in indexing an searching?
doc.add(new Field(title, value, Field.Store.YES,
This works and I can reuse token streams. But why TokenStream.reset() does not
work which was in my earlier case. Is this a marker method in TokenStream
without implementation and CachingTokenFilter implements the method.
- BR
Mark Miller [EMAIL PROTECTED] wrote:
reset is optional.
I found this page extremely helpful in finding out EXACTLY what Lucene is
doing (and how, if I wanted to, to change it). Like Erik said, it does
pretty darn well just as it is. I'm not sure if anyone has already pointed
you to this page yet.
You'll have to spend some time diving down in to
That is already in the similarity formula, in tf term, documents that
have more occurrences of a given term receive a higher score.
Jamal H Tandina wrote:
If you want to give priority to documents that are larger, like z1, you
should change the DefaultSimilarity (at index time), more
For your specific problem you need to change the DefaultSimilarity only
at index time, because the lengthNorm is written to the index when is
created.
So... first you'll need to extend the DefaultSimilarity and override the
lengthNorm() method with the one suggested in the previous replay; then
I strongly recommend against this. Simple word counts are a poor
measure of relevance. Which is why Lucene doesn't score that
way. Do you have an example showing why the default scoring is
inadequate or is this just an assumption?
It would be helpful if you gave us some idea of what you're trying
If you want to give priority to documents that are larger, like z1, you
should change the DefaultSimilarity (at index time), more exactly the
method:
public float lengthNorm(String fieldName, int numTerms) {
return (float)(1.0 / Math.sqrt(numTerms));
}
to something like this
10 matches
Mail list logo