Re: Termfreq

2008-12-03 Thread Gustavo Corral
t; > > > I hope this is not a silly question, but I should ask. > > > > I developed a IR system for XML documents with Lucene and I was checking > > the > > explain() output for some queries, but I don't understand this part: > > > > 0.121383816 = f

Re: Termfreq

2008-12-03 Thread Erick Erickson
ieldWeight(title:efecto in 1), product of: > 1.0 = tf(termFreq(title:efecto)=1) > 0.7768564 = idf(docFreq=4) > > It suppose tf refears to the term's frequency in the document, but I know > there are more than one occurrences of this term in this document, so I > noted that te

Termfreq

2008-12-03 Thread Gustavo Corral
Hi list, I hope this is not a silly question, but I should ask. I developed a IR system for XML documents with Lucene and I was checking the explain() output for some queries, but I don't understand this part: 0.121383816 = fieldWeight(title:efecto in 1), product of: 1.0 = tf(ter

Re: How to build your custom termfreq vector an add it to the field ?

2007-11-09 Thread Grant Ingersoll
Not really sure what to tell you other than you need to dig in and look at how the other Query classes are implemented. I would start with TermQuery/TermScorer. One thing I did to get to know the scoring was to go through and document it the best I could (given the time I had) as pseudocod

Re: How to build your custom termfreq vector an add it to the field ?

2007-11-08 Thread Ariel
Very interesting the link you suggest me Mr Grant Ingersoll. Let see if I understand how the ranking issue in lucene could be implemented: 1. First I must create my own query class extending the abstract Query class. The only method I must implement from this class is toString. Is right this ?

Re: How to build your custom termfreq vector an add it to the field ?

2007-11-07 Thread Grant Ingersoll
Term Vectors (specifically TermFreqVector) in Lucene are a storage mechanism for convenience and applications to use. They are not an integral part of the scoring in the way you may be thinking of them in terms of the traditional Vector Space Model, thus there may be some confusion from th

Re: How to build your custom termfreq vector an add it to the field ?

2007-11-07 Thread Ariel
Then if I want to use another scoring formula I must to implement my own Query/Weigh/Scorer ? For example instead of cousine distance leiderbage distance or .. another. I'm studying Query/Weigh/Scorer classes to find out how to do that but there is not much documentation about that. I have seen I

Re: How to build your custom termfreq vector an add it to the field ?

2007-11-06 Thread Grant Ingersoll
what you are trying to do. HTH, Grant On Nov 6, 2007, at 5:27 PM, Ariel wrote: Hi: I want to build a custom termfreq vector an add it to the field to store it to the index. I want to use lucene for research, I'm thinking to make some experimentation so I need to store a term vector

Re: zero termfreq for some search strings with special characters

2007-06-20 Thread Erick Erickson
write your own Analyzer that breaks the stream however you want, and use *that* analyzer at index and search time. Then looking at termfreq will work as you expect. PerFieldAnalyzerWrapper will allow you to treat different fields differently, which may help if you want one sort of behavior for one fi

Re: zero termfreq for some search strings with special characters

2007-06-20 Thread SK R
Hi, Thanks for your reply. But how do I get termfreq of that term("emp-id")? Does Lucene have any other way to handle this? I appreciate any solution regarding this problem. Regards SenthilKumaran On 6/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: You ar

RE: zero termfreq for some search strings with special characters

2007-06-20 Thread Liu_Andy2
Sent: Wednesday, June 20, 2007 5:24 PM To: [email protected] Subject: zero termfreq for some search strings with special characters Hi, I'm using standard tokenizer for both indexing and searching process.Myindexed value is like "emp-id Aq234 kaith creating document for s

zero termfreq for some search strings with special characters

2007-06-20 Thread SK R
Hi, I'm using standard tokenizer for both indexing and searching process.Myindexed value is like "emp-id Aq234 kaith creating document for search". I can get search results for the query CONTENT:"emp-id" by using hits = indexSearcher.search(*query*). But if I try to get termfrequency of t

Re: How to get termfreq. of each doc for wildcard terms?

2007-04-24 Thread SK R
Hi, Anybody have idea about my previous post? Regards RSK On 4/23/07, SK R <[EMAIL PROTECTED]> wrote: Hi, In my application, sometimes I need to find doc Id with term frequency of my terms in my index of multi lines, tokenized & indexed with Standard Analyzer. For this, now I'm using *

How to get termfreq. of each doc for wildcard terms?

2007-04-23 Thread SK R
Hi, In my application, sometimes I need to find doc Id with term frequency of my terms in my index of multi lines, tokenized & indexed with Standard Analyzer. For this, now I'm using * TermDocs termDocs= reader.termDocs(new Term("FIELD","book1"); while(termDocs.next()) { matches +=

Re: How to get TermFreq only in some query results

2006-07-27 Thread Jia Mi
Thank you, Grant, really help me :P On 7/27/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: You could store Term Vectors for your documents, and then look up the individual document vectors based on the query results. If you need help w/ Term Vectors, check out Lucene in Action, search this li

Re: How to get TermFreq only in some query results

2006-07-27 Thread Grant Ingersoll
You could store Term Vectors for your documents, and then look up the individual document vectors based on the query results. If you need help w/ Term Vectors, check out Lucene in Action, search this list, or http://www.cnlp.org/apachecon2005 -Grant On Jul 27, 2006, at 4:52 AM, Jia Mi wr

How to get TermFreq only in some query results

2006-07-27 Thread Jia Mi
Hi everyone, I am just developing an application using Lucene, and I know how to get the Term Freq via the IndexReader for the whole corpus. But I wonder if I can get the term freq statistics just inside the query results, like I want the hot words in just recent two weeks added into Lucene indic