date:20071102

RE : Re: problem undestanding the hits.score

2007-11-02 Thread Jamal H Tandina

Thank you for your reply How can i change the defaultSimilarity in the indexing and the searching, do you have an example or an url how to set the Similarity ? http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html Thanks again Ion

Re: problem undestanding the hits.score

2007-11-02 Thread Ion Badita

Try too look at Similarity, there you will find thinks about the scoring. Your query is more similar with the shorter document. If you have 2 documents with a field body; first with words red flower and the second with just one word flower, and search for the word flower, the second document

how to use Field.TermVector

2007-11-02 Thread Jamal H Tandina

Hi Iam having problem using Field.TermVector, i dont know how to use it. does some one have an exemple or an address, how to use the termVector in indexing an searching? doc.add(new Field(title, httpd.getTiltle(), Field.Store.YES, Field.TermVector)); Thanks

Re: how to use Field.TermVector

2007-11-02 Thread Grant Ingersoll

On Nov 2, 2007, at 4:37 AM, Jamal H Tandina wrote: Hi Iam having problem using Field.TermVector, i dont know how to use it. does some one have an exemple or an address, how to use the termVector in indexing an searching? doc.add(new Field(title, value, Field.Store.YES,

Re: Best way to count tokens

2007-11-02 Thread Cool Coder

This works and I can reuse token streams. But why TokenStream.reset() does not work which was in my earlier case. Is this a marker method in TokenStream without implementation and CachingTokenFilter implements the method. - BR Mark Miller [EMAIL PROTECTED] wrote: reset is optional.

Re: problem understanding the hits.score

2007-11-02 Thread Donna L Gresh

I found this page extremely helpful in finding out EXACTLY what Lucene is doing (and how, if I wanted to, to change it). Like Erik said, it does pretty darn well just as it is. I'm not sure if anyone has already pointed you to this page yet. You'll have to spend some time diving down in to

Re: RE : Re: problem undestanding the hits.score

2007-11-02 Thread Ion Badita

That is already in the similarity formula, in tf term, documents that have more occurrences of a given term receive a higher score. Jamal H Tandina wrote: If you want to give priority to documents that are larger, like z1, you should change the DefaultSimilarity (at index time), more

Re: RE : Re: problem undestanding the hits.score

2007-11-02 Thread Ion Badita

For your specific problem you need to change the DefaultSimilarity only at index time, because the lengthNorm is written to the index when is created. So... first you'll need to extend the DefaultSimilarity and override the lengthNorm() method with the one suggested in the previous replay; then

Re: RE : Re: problem undestanding the hits.score

2007-11-02 Thread Erick Erickson

I strongly recommend against this. Simple word counts are a poor measure of relevance. Which is why Lucene doesn't score that way. Do you have an example showing why the default scoring is inadequate or is this just an assumption? It would be helpful if you gave us some idea of what you're trying

RE : Re: problem undestanding the hits.score

2007-11-02 Thread Jamal H Tandina

If you want to give priority to documents that are larger, like z1, you should change the DefaultSimilarity (at index time), more exactly the method: public float lengthNorm(String fieldName, int numTerms) { return (float)(1.0 / Math.sqrt(numTerms)); } to something like this

RE : Re: problem undestanding the hits.score

Re: problem undestanding the hits.score

how to use Field.TermVector

Re: how to use Field.TermVector

Re: Best way to count tokens

Re: problem understanding the hits.score

Re: RE : Re: problem undestanding the hits.score

Re: RE : Re: problem undestanding the hits.score

Re: RE : Re: problem undestanding the hits.score

RE : Re: problem undestanding the hits.score

10 matches

Site Navigation

Mail list logo

Footer information