lengthnorm again

2006-07-13 Thread Zhao, Xin
Hi, I am sure this is a question been asked before. :-) I have done some research too, but still don't quite understand. I indexed 20 terms under field name "mesh", and set the boost accordingly from 20 to 1.(just some arbitrary numbers) But when I checked the index from Luke, the boosts all app

scoring formula

2006-08-02 Thread Zhao, Xin
Hi, I noticed the scoring formula in the errata of book "Lucene in Action" is a little different from the one in Javadoc. I enclosed the one in Javadoc at the end of email. getBoost(t in q) is in Javadoc's formula (which I assume is the correct one), but not in "lucene in action". Any idea? We n

Re: scoring formula

2006-08-04 Thread Zhao, Xin
ply as I did not fully read your message before replying. *ugh* Erik On Aug 2, 2006, at 2:32 PM, Zhao, Xin wrote: Hi, I noticed the scoring formula in the errata of book "Lucene in Action" is a little different from the one in Javadoc. I enclosed the one in Javadoc at the end of ema

controlled library

2006-08-24 Thread Zhao, Xin
Hi, I have a design question. Here is what we try to do for indexing: We designed an indexing tool to generate standard MeSH terms from medical citations, and then use Lucene to save the terms and citations for future search. The information we need to save are: a) the exact mesh terms (top 10) b

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
rmance in a very bad way? Regards, Xin - Original Message - From: "Dedian Guo" <[EMAIL PROTECTED]> To: ; "Zhao, Xin" <[EMAIL PROTECTED]> Sent: Thursday, August 24, 2006 4:22 PM Subject: Re: controlled library in my solution, you can apply one doc for

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
rm = meshList.get(i) document.addField( new Field( "mesh", meshTerm.semanticWebConceptId, Field.Store.YES , Field.Index.NO_NORMS ); } when querying this index, create a analyzer that infers the text string and generates id's that correspond to the mesh term in the semantic web Z

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
then perhaps you can change it to Field.Index.TOKENIZED, but i was not aware that pubmed boosts mesh terms, they broadly classify terms as major and minor, if you plan to use this simple system of classification consider adding the major terms twice to the document ? Zhao, Xin wrote: Hi, Rup

Re: controlled vocabulary

2006-08-25 Thread Zhao, Xin
d see that i am in the early learning stage now. xin - Original Message - From: "Zhao, Xin" <[EMAIL PROTECTED]> To: Sent: Friday, August 25, 2006 10:21 AM Subject: Re: controlled vocabulary Hi, Thank you for your reply. I had thought about the first two solutions be